This page is under construction and will be updated regularly.
CUDA is a parallel computing platform and programming model developed by NVIDIA. It allows developers to use NVIDIA GPUs for general-purpose processing (GP-GPU). It enabled researchers from various application domains to accelerate their applications. CUDA provides a set of tools, libraries, and an extension of the C/C++ programming languages. CUDA allows the creation of a large number of threads that can be spawned to utilize the GPU’s hardware resources.
CUDA Basics
- Introduction to CUDA
- CUDA Programming Model
- Hello World Kernel
- Thread Indexing and IDs
- Compilation and Execution
- Error Handling
- Device Query
- Vector Addition Example
Memory Management
- Host and Device Memory
- Memory Allocation
- Memory Transfer Basics
- Memory Transfer Patterns
- Unified Memory
- Pinned Memory
- 2D and 3D Memory
- Memory Copy Optimization
Kernel Programming
- Kernel Launch Configuration
- Thread Synchronization Basics
- Shared Memory Basics
- Shared Memory Example
- Register Usage
- Warp-Level Operations
- Atomic Operations
- Dynamic Parallelism
- Constant Memory
- Texture Memory Basics
Optimization Techniques
- Memory Coalescing
- Bank Conflicts in Shared Memory
- Loop Unrolling
- Instruction-Level Parallelism
- Memory Access Patterns
- Reducing Warp Divergence
- Stream Compaction
- Matrix Transpose Optimization
- Reduction Patterns
- Prefix Sum (Scan)
- Histogram Computation
Advanced Topics
- CUDA Streams
- Asynchronous Data Transfer
- Events and Timing
- Multi-GPU Programming
- Cooperative Groups
- Warp Shuffle Operations
- Unified Memory Advanced
- Dynamic Shared Memory
- Function Pointers and Callbacks
- CUDA Graphs
CUDA Libraries and Tools
- cuBLAS Basics
- cuFFT Basics
- Thrust Library Introduction
- cuRAND for Random Numbers
- Nsight Compute Profiling
- Nsight Systems Profiling
- CUDA-GDB Debugging
- Memory Checker (cuda-memcheck)