Introduction to OpenMP

by

in

OpenMP is a tool that helps programmers write faster computer programs using multiple processors at the same time. It works with C, C++, and Fortran languages. With OpenMP, you don’t need to worry about managing threads yourself. The tool makes it easier to write parallel code, which means your program can use many cores of a computer’s processor without making things too complicated. By adding just a few lines of special instructions, you can turn normal programs into ones that run much faster on multi-core processors. This helps scientists and engineers get their work done quicker and more efficiently.

Refer to following diagram for OpenMP architecture overview.

OpenMP-architecture-overview

Core Concept

OpenMP uses special comments called #pragma statements to tell the compiler when and how to use multiple threads. The compiler reads these comments and changes your code to run on many threads at once. In OpenMP, all threads share the same memory space. This makes it easy for threads to share data with each other. It is simpler than using paradigms like MPI, where each process has its own separate memory space.

Key Points

  • Compiler Directives: Use #pragma omp to mark parallel regions
  • Shared Memory: All threads access common memory space
  • Incremental Parallelization: Add parallelism gradually to existing code
  • Portability: Code runs on any OpenMP-compliant compiler (GCC, Intel, Clang)
  • Fork-Join Model: Master thread spawns worker threads when needed

Code Example

A simple program demonstrating OpenMP compilation and execution

OpenMP Implementation:

#include <stdio.h>
#include <omp.h>

int main() {
    printf("Before parallel region\n");

    #pragma omp parallel
    {
        int thread_id = omp_get_thread_num();
        printf("Hello from thread %d\n", thread_id);
    }

    printf("After parallel region\n");
    return 0;
}

Compilation:

gcc -fopenmp -o hello_openmp hello_openmp.c
./hello_openmp

Expected Output:

Before parallel region
Hello from thread 0
Hello from thread 1
Hello from thread 2
Hello from thread 3
After parallel region

The number of threads depends on how many CPU cores are available. Thread output order can be different each time, which is normal for tasks running at the same time.

Usage & Best Practices

When to Use

  • Applications with computationally intensive loops
  • Shared memory multicore systems (desktops, workstations, single-node HPC)
  • Scientific simulations requiring rapid prototyping

Best Practices

  • Start with sequential correctness before parallelizing
  • Use OpenMP 3.0+ for modern features (most systems support this)
  • Enable compiler optimization flags (-O2 or -O3) with -fopenmp
  • Test with different thread counts to find optimal performance

Common Mistakes

  • Forgetting -fopenmp flag results in serial execution
  • Ignoring race conditions in shared data

Key Takeaways

Summary:

  • OpenMP simplifies shared memory parallel programming with compiler directives
  • The #pragma omp parallel directive creates concurrent thread teams
  • Requires OpenMP-compliant compiler with -fopenmp flag (GCC/Clang) or -qopenmp (Intel)
  • Ideal for multicore systems and incremental parallelization

Quick Reference

Compilation:

# GCC/Clang
gcc -fopenmp -O2 -o program source.c

# Intel Compiler
icc -qopenmp -O2 -o program source.c

References:

Go back to OpenMP tutorials.


Mandar Gurav Avatar

Mandar Gurav

Parallel Programmer, Trainer and Mentor


If you are new to Parallel Programming you can start here.



Beginner CUDA Fortran Hello World Message Passing Interface MPI Nvidia Nsight Systems NVPROF OpenACC OpenACC Fortran OpenMP PGI Fortran Compiler Profiling Vector Addition


Popular Categories