CUDA: Device Query

by

in

Using cudaGetDeviceProperties() lets your program learn about the GPU’s features. It tells you things like how powerful the GPU is, how much memory it has, and how many multiprocessors it has. This information helps you write better CUDA code that works well on different types of GPUs. For example, it can help you decide the best way to run your program on the GPU if certain features are available. Knowing these details also helps fix problems related to specific hardware. Robust scientific programs need to work well on many kinds of GPUs, from older ones like Maxwell to newer ones like Blackwell.

Core Concept

The cudaDeviceProp structure has complete information about a GPU. You can get this information using the cudaGetDeviceProperties() function. Key properties include compute capability (determines supported features), total global memory, maximum threads per block, multiprocessor count, and shared memory size. Programs use this information to check if they meet hardware requirements, find the best settings for running efficiently, and adjust algorithms to work well with the GPU.

Key Points

  • cudaGetDeviceProperties(): Retrieves GPU specifications into cudaDeviceProp struct
  • Compute Capability: Major.minor version (e.g., 7.5 for Turing)
  • Memory Properties: Total memory, shared memory per block, constant memory
  • Thread Limits: Max threads per block, max dimensions, warp size
  • Multiprocessor Info: SM count, CUDA core estimate, clock rates
  • Feature Support: Concurrent kernels, unified addressing, peer-to-peer

Code Example

Querying and displaying essential GPU properties

CUDA Implementation:

#include <cuda_runtime.h>

int main() {
    int deviceCount;
    cudaGetDeviceCount(&deviceCount);

    for (int dev = 0; dev < deviceCount; dev++) {
        cudaDeviceProp prop;
        cudaGetDeviceProperties(&prop, dev);

        printf("Device %d: %s\n", dev, prop.name);
        printf("  Compute Capability: %d.%d\n", prop.major, prop.minor);
        printf("  Total Memory: %.2f GB\n", prop.totalGlobalMem / 1e9);
        printf("  Multiprocessors: %d\n", prop.multiProcessorCount);
        printf("  Max Threads/Block: %d\n", prop.maxThreadsPerBlock);
        printf("  Shared Memory/Block: %zu bytes\n", prop.sharedMemPerBlock);
        printf("  Warp Size: %d\n", prop.warpSize);

        // Check feature support
        if (prop.major >= 7) {
            printf("  Supports Tensor Cores\n");
        }
        if (prop.concurrentKernels) {
            printf("  Supports Concurrent Kernels\n");
        }
    }
    return 0;
}

Using Properties for Configuration:

cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, 0);

// Adaptive block size
int threadsPerBlock = prop.maxThreadsPerBlock > 512 ? 512 : 256;

// Check shared memory availability
if (prop.sharedMemPerBlock >= 49152) {
    // Use 48KB shared memory configuration
}

Usage & Best Practices

When to Query Properties

  • Startup: Validate GPU meets minimum requirements
  • Configuration: Calculate optimal launch parameters
  • Feature Detection: Check for specific capabilities (e.g., double precision)
  • Multi-GPU: Select appropriate device for workload

Best Practices

  • Query once at startup, cache results
  • Validate compute capability for required features
  • Use properties to size shared memory and registers
  • Check maxThreadsPerBlock before kernel launch

Common Mistakes

  • Avoid: Hardcoding launch parameters without checking limits
  • Avoid: Assuming all GPUs support same features

Key Takeaways

Summary:

  • cudaGetDeviceProperties() provides comprehensive GPU information
  • Compute capability indicates architecture generation and features
  • Memory, thread, and block limits vary across GPU models
  • Use properties to check requirements and optimize configurations
  • Query multiprocessor count for performance estimation
  • Feature flags indicate capability support (concurrent kernels, etc.)

Quick Reference

Basic Query:

cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, deviceId);

Essential Properties:

nameGPU model name (string)
major, minorCompute capability
totalGlobalMemTotal device memory (bytes)
sharedMemPerBlockShared memory per block (bytes)
maxThreadsPerBlockMaximum threads per block
maxThreadsDim[3]Max threads in each dimension
maxGridSize[3]Max blocks in each dimension
multiProcessorCountNumber of SMs
warpSizeThreads per warp (always 32)
clockRateGPU clock rate (kHz)
concurrentKernelsConcurrent kernel support (bool)

Compute Capabilities:

  • 5.x: Maxwell
  • 6.x: Pascal
  • 7.x: Volta/Turing
  • 8.x: Ampere
  • 9.x: Hopper

Validation Example:

if (prop.major < 6) {
    printf("Error: Requires Pascal or newer\n");
    exit(1);
}

References:

  1. CUDA Official Documentation

Go back to CUDA tutorials.


Mandar Gurav Avatar

Mandar Gurav

Parallel Programmer, Trainer and Mentor


If you are new to Parallel Programming you can start here.



Beginner CUDA Fortran Hello World Message Passing Interface MPI Nvidia Nsight Systems NVPROF OpenACC OpenACC Fortran OpenMP PGI Fortran Compiler Profiling Vector Addition


Popular Categories