OpenMP Barrier Construct - Learn Parallel Programming

In OpenMP, Barrier construct is used for synchronization among the threads within a given parallel region. This is the easiest way to synchronize across all the threads.

Synchronization is required in many cases. For example, data produced by some/all threads needs to be used/consumed by some/all the other threads in the same parallel region. Here, synchronization among threads is required because few threads might be executing ahead of the other threads and the data to be consumed by one thread might not be yet produced/ready by another thread. To avoid such situation, we can use barrier construct to make sure that threads are synchronized at appropriate points in the code. But, we have to be very cautious, overdoing it will slow down (parallel) performance of your code.

Syntax

for C/C++ is –

#pragma omp barrier

And for Fortran is –

!$omp barrier

First, we will see how it’s used. Simple Pseudo code for Barrier construct is as follows –

Pseudo code

#pragma omp parallel
{
	// Task 1: Do some work
	
	
	// use barrier to synchronize across the threads; All the threads wait here until all the threads have reached till this point	
	#pragma omp barrier

		
	// Task 2 : Do some more work
		
}

Here in this Pseudo code, we see that all the threads start off with Task 1. For some reason, if we would like to add a synchronization point before the threads move on to Task 2, we can add a Barrier in between the two tasks. While executing the OpenMP Barrier, all the threads will wait for all the other threads to reach to this point before executing next statements in the code. In this pseudo code, execution of Task 2 will be executed only after all the threads have reached to the Barrier statement. If one or more threads are still executing Task 1 while other threads have reached Barrier statement, all these threads will continue to wait till all the remaining threads finish executing Task 1.

Example

Let’s try to understand this construct using following example.

#include<stdio.h>
#include<omp.h>

int main()
{
	// Set number of threads equal to 4
	omp_set_num_threads(4);
	
	// Parallel Region starts
	#pragma omp parallel
	{
		int num_thds, myid;
		
		// Get total number of threads in this parallel region
		num_thds = omp_get_num_threads();
		
		// Get unique identification number for the given thread among all the threads in this parallel region
		myid = omp_get_thread_num();
		
		printf("\nFirst printf: %d out of %d thds!", myid, num_thds);
		
		// Wait for all the threads to reach to this point
		#pragma omp barrier
		
		printf("\nSecond printf: %d out of %d thds!", myid, num_thds);
		
	}
	printf("\nProgram Exit!\n");
}

We can compile this code using following command –

gcc -fopenmp barrier.c

Output of this code will look something like/equivalent to this –

First printf: 0 out of 4 thds!
First printf: 2 out of 4 thds!
First printf: 1 out of 4 thds!
First printf: 3 out of 4 thds!
Second printf: 1 out of 4 thds!
Second printf: 2 out of 4 thds!
Second printf: 0 out of 4 thds!
Second printf: 3 out of 4 thds!
Program Exit!

Here, order in which the First printf is printed by these 4 threads may vary (because we do not control on which threads to execute the printf statement before other threads). But none of the Second printf statements will be executed before all the threads have finished executing First printf.

If we comment out Barrier statement, output will look something like/equivalent to this –

First printf: 0 out of 4 thds!
Second printf: 0 out of 4 thds!
First printf: 1 out of 4 thds!
Second printf: 1 out of 4 thds!
First printf: 2 out of 4 thds!
First printf: 3 out of 4 thds!
Second printf: 3 out of 4 thds!
Second printf: 2 out of 4 thds!
Program Exit!

Here, again the order in which these statements are executed may not be necessarily same as mentioned above. But, we can clearly observe that – now the threads are no longer waiting for other threads to finish First printf before executing Second printf.

Resources:
1. OpenMP official website