Introduction to OpenMP

OpenMP is a tool that helps programmers write faster computer programs using multiple processors at the same time. It works with C, C++, and Fortran languages. With OpenMP, you don’t need to worry about managing threads yourself. The tool makes it easier to write parallel code, which means your program can use many cores of a computer’s processor without making things too complicated. By adding just a few lines of special instructions, you can turn normal programs into ones that run much faster on multi-core processors. This helps scientists and engineers get their work done quicker and more efficiently.

Refer to following diagram for OpenMP architecture overview.

Core Concept

OpenMP uses special comments called #pragma statements to tell the compiler when and how to use multiple threads. The compiler reads these comments and changes your code to run on many threads at once. In OpenMP, all threads share the same memory space. This makes it easy for threads to share data with each other. It is simpler than using paradigms like MPI, where each process has its own separate memory space.

Key Points

Compiler Directives: Use #pragma omp to mark parallel regions
Shared Memory: All threads access common memory space
Incremental Parallelization: Add parallelism gradually to existing code
Portability: Code runs on any OpenMP-compliant compiler (GCC, Intel, Clang)
Fork-Join Model: Master thread spawns worker threads when needed

Code Example

A simple program demonstrating OpenMP compilation and execution

OpenMP Implementation:

#include <stdio.h>
#include <omp.h>

int main() {
    printf("Before parallel region\n");

    #pragma omp parallel
    {
        int thread_id = omp_get_thread_num();
        printf("Hello from thread %d\n", thread_id);
    }

    printf("After parallel region\n");
    return 0;
}

Compilation:

gcc -fopenmp -o hello_openmp hello_openmp.c
./hello_openmp

Expected Output:

Before parallel region
Hello from thread 0
Hello from thread 1
Hello from thread 2
Hello from thread 3
After parallel region

The number of threads depends on how many CPU cores are available. Thread output order can be different each time, which is normal for tasks running at the same time.

Usage & Best Practices

When to Use

Applications with computationally intensive loops
Shared memory multicore systems (desktops, workstations, single-node HPC)
Scientific simulations requiring rapid prototyping

Best Practices

Start with sequential correctness before parallelizing
Use OpenMP 3.0+ for modern features (most systems support this)
Enable compiler optimization flags (-O2 or -O3) with -fopenmp
Test with different thread counts to find optimal performance

Common Mistakes

Forgetting -fopenmp flag results in serial execution
Ignoring race conditions in shared data

Key Takeaways

Summary:

OpenMP simplifies shared memory parallel programming with compiler directives
The #pragma omp parallel directive creates concurrent thread teams
Requires OpenMP-compliant compiler with -fopenmp flag (GCC/Clang) or -qopenmp (Intel)
Ideal for multicore systems and incremental parallelization

Quick Reference

Compilation:

# GCC/Clang
gcc -fopenmp -O2 -o program source.c

# Intel Compiler
icc -qopenmp -O2 -o program source.c

References:

OpenMP Official Site

Go back to OpenMP tutorials.

Learn Parallel Programming