Parallel Programming using OpenMP

This page is under construction and updates are being made on a regular basis.

OpenMP (Open Multi-Processing) is a widely-used API that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran. It provides a set of compiler directives, library routines, and environment variables that allow developers to write parallel code for multi-core and multi-processor systems.

OpenMP Basics

Parallel Regions

  • Parallel Construct Fundamentals
  • Conditional Parallelization
  • Nested Parallelism
  • Thread Binding and Affinity
  • Thread Limits
  • Parallel Region Best Practices
  • Dynamic Thread Teams
  • Orphaned Directives

Work-Sharing Constructs

  • Work-Sharing Concepts
  • Parallel For Loop Basics
  • Loop Scheduling Policies
  • Schedule Clause Details
  • Loop Dependencies and Parallelizability
  • Sections Construct
  • Single Construct
  • Master Construct
  • Workshare (Fortran) / Parallel Loop Variations
  • Work-Sharing Best Practices

Data Environment

  • Data Scoping Fundamentals
  • Shared Variables
  • Private Variables
  • Firstprivate Variables
  • Lastprivate Variables
  • Default Clause
  • Threadprivate Variables
  • Copyin and Copyprivate
  • Reduction Concepts
  • Data Environment Best Practices

Synchronization

  • Synchronization Overview
  • Barrier Construct
  • Critical Section
  • Atomic Operations
  • Reduction Clause
  • User-Defined Reductions
  • Locks and Mutual Exclusion
  • Ordered Construct
  • Flush Directive
  • Synchronization Best Practices

Tasking

  • Task-Based Parallelism Concepts
  • Task Directive Basics
  • Task Synchronization
  • Task Dependencies
  • Taskloop Directive
  • Task Priority
  • Taskgroup Construct
  • Untied and Mergeable Tasks
  • Recursive Algorithms with Tasks
  • Task-Based Best Practices

SIMD and Vectorization

  • SIMD Fundamentals
  • SIMD Directive
  • SIMD Clauses
  • Combining SIMD with Parallelism
  • Declare SIMD
  • SIMD Reductions
  • Alignment and Memory Access
  • SIMD Best Practices

Advanced Topics

  • Cancellation
  • Target Directives (Offloading)
  • Teams Construct
  • Distribute Construct
  • Memory Allocators
  • Thread Affinity and Places
  • Loop Transformations
  • Metadirectives
  • User-Defined Mappers
  • Interoperability

Performance and Optimization

  • Performance Measurement
  • Speedup and Scalability Analysis
  • Load Balancing
  • False Sharing
  • NUMA Effects
  • Reducing Synchronization Overhead
  • Memory Access Optimization
  • Profiling and Debugging
  • Compiler Optimization
  • Best Practices Summary

References:


Mandar Gurav Avatar

Mandar Gurav

Parallel Programmer, Trainer and Mentor


If you are new to Parallel Programming you can start here.



Beginner CUDA Fortran Hello World Message Passing Interface MPI Nvidia Nsight Systems NVPROF OpenACC OpenACC Fortran OpenMP PGI Fortran Compiler Profiling Vector Addition


Popular Categories