Welcome!
-
OpenMP uses a fork-join model. In this model, the main thread creates (forks) many threads to do work together. After all the threads finish their job, the main thread waits for them before moving on (joins). This model helps make…
-
Introduction to OpenMP
OpenMP is a tool that helps programmers write faster computer programs using multiple processors at the same time. It works with C, C++, and Fortran languages. With OpenMP, you don’t need to worry about managing threads yourself. The tool makes…
-
CUDA : Vector Addition Example
Vector addition (C[i] = A[i] + B[i]) is the our first parallel CUDA program, integrating memory management, data transfer, kernel execution, and error handling. This complete example demonstrates the full CUDA workflow: allocate device memory with cudaMalloc(), copy data with…
-
CUDA: Device Query
Using cudaGetDeviceProperties() lets your program learn about the GPU’s features. It tells you things like how powerful the GPU is, how much memory it has, and how many multiprocessors it has. This information helps you write better CUDA code that…
-
CUDA: Error Handling
Robust CUDA programs require systematic error checking since GPU operations can fail silently. When you start a kernel on the GPU, it runs immediately without giving an error code if something goes wrong. Using cudaError_t, cudaGetLastError(), and error-checking macros helps…