Using cudaGetDeviceProperties() lets your program learn about the GPU’s features. It tells you things like how powerful the GPU is, how much memory it has, and how many multiprocessors it has. This information helps you write better CUDA code that works well on different types of GPUs. For example, it can help you decide the best way to run your program on the GPU if certain features are available. Knowing these details also helps fix problems related to specific hardware. Robust scientific programs need to work well on many kinds of GPUs, from older ones like Maxwell to newer ones like Blackwell.
Core Concept
The cudaDeviceProp structure has complete information about a GPU. You can get this information using the cudaGetDeviceProperties() function. Key properties include compute capability (determines supported features), total global memory, maximum threads per block, multiprocessor count, and shared memory size. Programs use this information to check if they meet hardware requirements, find the best settings for running efficiently, and adjust algorithms to work well with the GPU.
Key Points
- cudaGetDeviceProperties(): Retrieves GPU specifications into
cudaDevicePropstruct - Compute Capability: Major.minor version (e.g., 7.5 for Turing)
- Memory Properties: Total memory, shared memory per block, constant memory
- Thread Limits: Max threads per block, max dimensions, warp size
- Multiprocessor Info: SM count, CUDA core estimate, clock rates
- Feature Support: Concurrent kernels, unified addressing, peer-to-peer
Code Example
Querying and displaying essential GPU properties
CUDA Implementation:
#include <cuda_runtime.h>
int main() {
int deviceCount;
cudaGetDeviceCount(&deviceCount);
for (int dev = 0; dev < deviceCount; dev++) {
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, dev);
printf("Device %d: %s\n", dev, prop.name);
printf(" Compute Capability: %d.%d\n", prop.major, prop.minor);
printf(" Total Memory: %.2f GB\n", prop.totalGlobalMem / 1e9);
printf(" Multiprocessors: %d\n", prop.multiProcessorCount);
printf(" Max Threads/Block: %d\n", prop.maxThreadsPerBlock);
printf(" Shared Memory/Block: %zu bytes\n", prop.sharedMemPerBlock);
printf(" Warp Size: %d\n", prop.warpSize);
// Check feature support
if (prop.major >= 7) {
printf(" Supports Tensor Cores\n");
}
if (prop.concurrentKernels) {
printf(" Supports Concurrent Kernels\n");
}
}
return 0;
}
Using Properties for Configuration:
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, 0);
// Adaptive block size
int threadsPerBlock = prop.maxThreadsPerBlock > 512 ? 512 : 256;
// Check shared memory availability
if (prop.sharedMemPerBlock >= 49152) {
// Use 48KB shared memory configuration
}
Usage & Best Practices
When to Query Properties
- Startup: Validate GPU meets minimum requirements
- Configuration: Calculate optimal launch parameters
- Feature Detection: Check for specific capabilities (e.g., double precision)
- Multi-GPU: Select appropriate device for workload
Best Practices
- Query once at startup, cache results
- Validate compute capability for required features
- Use properties to size shared memory and registers
- Check
maxThreadsPerBlockbefore kernel launch
Common Mistakes
- Avoid: Hardcoding launch parameters without checking limits
- Avoid: Assuming all GPUs support same features
Key Takeaways
Summary:
cudaGetDeviceProperties()provides comprehensive GPU information- Compute capability indicates architecture generation and features
- Memory, thread, and block limits vary across GPU models
- Use properties to check requirements and optimize configurations
- Query multiprocessor count for performance estimation
- Feature flags indicate capability support (concurrent kernels, etc.)
Quick Reference
Basic Query:
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, deviceId);
Essential Properties:
name | GPU model name (string) |
major, minor | Compute capability |
totalGlobalMem | Total device memory (bytes) |
sharedMemPerBlock | Shared memory per block (bytes) |
maxThreadsPerBlock | Maximum threads per block |
maxThreadsDim[3] | Max threads in each dimension |
maxGridSize[3] | Max blocks in each dimension |
multiProcessorCount | Number of SMs |
warpSize | Threads per warp (always 32) |
clockRate | GPU clock rate (kHz) |
concurrentKernels | Concurrent kernel support (bool) |
Compute Capabilities:
- 5.x: Maxwell
- 6.x: Pascal
- 7.x: Volta/Turing
- 8.x: Ampere
- 9.x: Hopper
Validation Example:
if (prop.major < 6) {
printf("Error: Requires Pascal or newer\n");
exit(1);
}