Author: Mandar Gurav
-
OpenMP: Set number of threads using omp_set_num_threads
OpenMP allows setting number of threads using omp_set_num_threads() function while the program is executing. This function is a part of Runtime Library Routines provided by OpenMP. This function helps developer to have greater control over number of threads used inside the OpenMP codes. This function can be called any number of times inside the OpenMP…
-
OpenMP Barrier Construct
In OpenMP, Barrier construct is used for synchronization among the threads within a given parallel region. This is the easiest way to synchronize across all the threads. Synchronization is required in many cases. For example, data produced by some/all threads needs to be used/consumed by some/all the other threads in the same parallel region. Here,…
-
OpenMP C/C++: Hello World!
This is the first OpenMP program, one can write for understanding the parallelization process using OpenMP. First, let us find out how to compile and execute this code. The ‘-fopenmp’ option here requests compiler to generate parallel threads for the given code using OpenMP. If we do not provide this option, compiler will ignore all…
-
Profiling OpenACC Code using NVPROF
Profiling your OpenACC code on a remote system can be tricky sometimes. Many times we try to profile the code in cluster environment where we need to use a job scheduler to submit our jobs. In such scenarios, command line based profiling comes handy. This tutorials provides some usage examples for NVIDIA’s command line profiler…
-
Compiling and Running OpenACC Fortran Codes using PGI Fortran
In this tutorial we will learn how to compile and execute an OpenACC Fortran code using PGI Fortran Compiler. Let’s look at the sample vector addition code parallelized using OpenACC Fortran based parallel loop construct. We can compile this code for Nvidia GPU using following command – Or Here, ‘-ta=tesla’ option informs compiler that compiler…