Category: CUDA
-
Nvidia Nsight Systems : Profiling for CUDA code
In this post we will look at steps involved in profiling of the CUDA code using Nvidia Nsight Systems. Let’s take a simple code which performs some array operations. To compile this code, we can use following command. Please note that I am using “-arch=sm_86” which instructs compiler to generate code for compute capability 8.6…
-
CUDA “Hello World!” : Array addition using single block
In this post, we are going to look at basic CUDA code. Even though it doesn’t necessarily prints “Hello World!”, being a very simple arithmetic operation, we will treat it as a “Hello World!” code for CUDA. As we are aware that the discrete GPU cards have their own memory, in CUDA we need to…