OpenMP : Thread Management

by

in

Controlling the number of threads is important for best performance when using many processors at the same time. OpenMP gives several ways to set how many threads to use: the num_threads clause, the OMP_NUM_THREADS environment variable, and runtime functions. Knowing these methods helps you control parallelism better. This allows programs to adapt to different hardware configurations and workload requirements for maximum efficiency.

Refer to following diagram for thread count control hierarchy.

openmp-thread-management

Core Concept

OpenMP decides how many threads to use in a certain order. The num_threads clause in a parallel directive is most important. It changes the number of threads even if other settings say something different. You can set the thread count for your whole system using the OMP_NUM_THREADS environment variable. Programs can also change the default thread count using functions like omp_set_num_threads(). If there are no specific settings, OpenMP usually uses the number of processor cores available.

Key Points

  • num_threads Clause: Highest priority, directive-specific control
  • OMP_NUM_THREADS: Environment variable for global default
  • omp_set_num_threads(): Programmatic default setting
  • omp_get_num_threads(): Query actual thread count (call from parallel region)
  • omp_get_max_threads(): Query default thread count for next parallel region
  • Dynamic Adjustment: Runtime can reduce thread count if system resources are limited

Code Example

Demonstrating various methods to control and query thread count

OpenMP Implementation:

#include <stdio.h>
#include <omp.h>

int main() {
    // Query default settings
    printf("Default max threads: %d\n", omp_get_max_threads());
    printf("Available processors: %d\n\n", omp_get_num_procs());

    // Method 1: Default behavior (uses OMP_NUM_THREADS or system default)
    #pragma omp parallel
    {
        #pragma omp master
        printf("Region 1 - Default: %d threads\n", omp_get_num_threads());
    }

    // Method 2: num_threads clause (highest priority)
    #pragma omp parallel num_threads(2)
    {
        #pragma omp master
        printf("Region 2 - num_threads(2): %d threads\n", 
               omp_get_num_threads());
    }

    // Method 3: Runtime function
    omp_set_num_threads(6);
    #pragma omp parallel
    {
        #pragma omp master
        printf("Region 3 - omp_set_num_threads(6): %d threads\n", 
               omp_get_num_threads());
    }

    // Method 4: num_threads overrides omp_set_num_threads
    #pragma omp parallel num_threads(3)
    {
        #pragma omp master
        printf("Region 4 - num_threads(3) override: %d threads\n", 
               omp_get_num_threads());
    }

    return 0;
}

Expected Output (with OMP_NUM_THREADS=4):

Default max threads: 4
Available processors: 8

Region 1 - Default: 4 threads
Region 2 - num_threads(2): 2 threads
Region 3 - omp_set_num_threads(6): 6 threads
Region 4 - num_threads(3) override: 3 threads

Usage & Best Practices

When to Use

  • Adapting to different hardware configurations
  • Testing scalability with varying thread counts
  • Limiting threads for memory-constrained applications
  • Oversubscription prevention in nested parallelism

Best Practices

  • Use num_threads for fine-grained control per region
  • Set OMP_NUM_THREADS for application-wide defaults
  • Query omp_get_num_procs() to avoid oversubscription
  • Call omp_get_num_threads() only from within parallel regions
  • Test performance with different thread counts to find optimal configuration

Common Mistakes

  • Calling omp_get_num_threads() outside parallel regions returns 1

Key Takeaways

Summary:

  • Thread count control hierarchy: num_threads > omp_set_num_threads() > OMP_NUM_THREADS > system default
  • Use omp_get_num_threads() within parallel regions to query actual count
  • Use omp_get_max_threads() outside parallel regions to query default
  • Proper thread management ensures optimal resource utilization

Quick Reference

Thread Count Precedence:

1. num_threads(N) clause    [Highest]
2. omp_set_num_threads(N)
3. OMP_NUM_THREADS=N
4. System default           [Lowest]

Query Functions:

omp_get_num_threads()   // Inside parallel region
omp_get_max_threads()   // Default for next region
omp_get_num_procs()     // Available processors

References:

Go back to OpenMP tutorials.


Mandar Gurav Avatar

Mandar Gurav

Parallel Programmer, Trainer and Mentor


If you are new to Parallel Programming you can start here.



Beginner CUDA Fortran Hello World Message Passing Interface MPI Nvidia Nsight Systems NVPROF OpenACC OpenACC Fortran OpenMP PGI Fortran Compiler Profiling Vector Addition


Popular Categories