Tag: Vector Addition
-
CUDA : Vector Addition Example
Vector addition (C[i] = A[i] + B[i]) is the our first parallel CUDA program, integrating memory management, data transfer, kernel execution, and error handling. This complete example demonstrates the full CUDA workflow: allocate device memory with cudaMalloc(), copy data with cudaMemcpy(), launch parallel kernel, retrieve results, verify correctness, and free allocated memories. Refer to following…
-
CUDA “Hello World!” : Array addition using single block
In this post, we are going to look at basic CUDA code. Even though it doesn’t necessarily prints “Hello World!”, being a very simple arithmetic operation, we will treat it as a “Hello World!” code for CUDA. As we are aware that the discrete GPU cards have their own memory, in CUDA we need to…
-
Compiling and Running OpenACC Fortran Codes using PGI Fortran
In this tutorial we will learn how to compile and execute an OpenACC Fortran code using PGI Fortran Compiler. Let’s look at the sample vector addition code parallelized using OpenACC Fortran based parallel loop construct. We can compile this code for Nvidia GPU using following command – Or Here, ‘-ta=tesla’ option informs compiler that compiler…