Parallel Programming using CUDA

This page is under construction and will be updated regularly.

CUDA is a parallel computing platform and programming model developed by NVIDIA. It allows developers to use NVIDIA GPUs for general-purpose processing (GP-GPU). It enabled researchers from various application domains to accelerate their applications. CUDA provides a set of tools, libraries, and an extension of the C/C++ programming languages. CUDA allows the creation of a large number of threads that can be spawned to utilize the GPU’s hardware resources.

CUDA Basics

Memory Management

  • Host and Device Memory
  • Memory Allocation
  • Memory Transfer Basics
  • Memory Transfer Patterns
  • Unified Memory
  • Pinned Memory
  • 2D and 3D Memory
  • Memory Copy Optimization

Kernel Programming

  • Kernel Launch Configuration
  • Thread Synchronization Basics
  • Shared Memory Basics
  • Shared Memory Example
  • Register Usage
  • Warp-Level Operations
  • Atomic Operations
  • Dynamic Parallelism
  • Constant Memory
  • Texture Memory Basics

Optimization Techniques

  • Memory Coalescing
  • Bank Conflicts in Shared Memory
  • Loop Unrolling
  • Instruction-Level Parallelism
  • Memory Access Patterns
  • Reducing Warp Divergence
  • Stream Compaction
  • Matrix Transpose Optimization
  • Reduction Patterns
  • Prefix Sum (Scan)
  • Histogram Computation

Advanced Topics

  • CUDA Streams
  • Asynchronous Data Transfer
  • Events and Timing
  • Multi-GPU Programming
  • Cooperative Groups
  • Warp Shuffle Operations
  • Unified Memory Advanced
  • Dynamic Shared Memory
  • Function Pointers and Callbacks
  • CUDA Graphs

CUDA Libraries and Tools

  • cuBLAS Basics
  • cuFFT Basics
  • Thrust Library Introduction
  • cuRAND for Random Numbers
  • Nsight Compute Profiling
  • Nsight Systems Profiling
  • CUDA-GDB Debugging
  • Memory Checker (cuda-memcheck)

References:


Mandar Gurav Avatar

Mandar Gurav

Parallel Programmer, Trainer and Mentor


If you are new to Parallel Programming you can start here.



Beginner CUDA Fortran Hello World Message Passing Interface MPI Nvidia Nsight Systems NVPROF OpenACC OpenACC Fortran OpenMP PGI Fortran Compiler Profiling Vector Addition


Popular Categories