As I want to learn more about CUDA, I was digging around for some base project for rendering CUDA content to screen without much overhead and I found none trivial example working on Linux not using old libraries. So I did some base project for myself and maybe it could be useful for someone.
To run this base project you will need:
In linux you can install CUDA Toolkit from NVIDIA page. Pay attention to instructions provided after the installation. If you are using Ubuntu or other Debian based distro, you may add the apt repository and install using aptitude. Follow the instructions here.
For other Linux distros or operating systems I do recommend to follow the manual.
On Ubuntu or Debian:
# apt install libsdl2-dev libsdl2-2.0-0 -y
Otherwise follow this tutorial by lazyfoo .
For GNU G++ ( I am using g++ 9.2 and haven’t test with others, but I think there is not reason for concern in this case.
# apt install g++
For clang:
apt install clang++
Get the source code from my github project .
$ git clone git@github.com:fsan/cuda_on_sdl.git
Just use make. In my computer I have CUDA 10.0 installed, be sure to change the variable in makefile or provide it at build time.
$ make
If everything build fine, you’ll have and executable main in your folder. Executing this will display ou a red screen rendered from GPU with cuda.
I think the structure is just too simple to explain and there are not anything magic happening. But in any case:
main.cpp: starts SDL2 Video, create window, allocates GPU memory, control frame rate.
gpu.cu: has the device and host code for copying data from GPU to host memory. also has the red screen kernel as example (although it would be better to be organized somewhere else). It will be compiled by nvcc and I haven’t set any special flag to increase compatibility, but for better performance it would be interesting to enable optimizations and specify CUDA compute capability level for target host.
One thing I saw on other examples and I think there is no better way of doing with CUDA is that there are some round trips by asking the cuda kernel to compile. AFAIK, we will run this GPU improved kernel, generate the result buffer, copy to host device then we will send it back to graphic driver for displaying through SDL. This double blitting seems unnecessary to me, but other projects with CUDA and SDL have the same characteristic. I think this is because CUDA is a GPGPU and not designed for this specific purpose. If there is a smarter way, please leave a comment on my github project or down below on the disqus. If the purpose is to do rasterization, probably it would be much better to use SDL interface do OpenGL directly, of if the objective was raytracing OptiX would be more efficient.
So why I did this?
I want to study cuda with different examples. To get more tangible examples without generating ppm/bmp files I would like to have a framebuffer to write directly. In SDL1 it was possible to write directly to the screen buffer, but for what I saw, on SDL2 you need a SDL_Renderer
and a SDL_Texture
for this. I think all this indirection and copying are probably not necessary, but I found no other way of doing. If you know how to do it only with SDL2 and CUDA please let me know.
Also, I think having this easy to start projects are good for those who want to focus on learning something. Maybe if I got sometime I may write how to apply filters to image or videos only using CUDA.
Let me know if you liked or not in my github project by leaving a star or an issue.
See you.