Your First OpenACC Fortran Program – Parallel Loop on GPU

What You’ll Learn Today

Today you’ll write your very first OpenACC program that actually runs on the GPU! Think of it like learning to ride a bicycle – once you understand the basic structure, everything else becomes easier.

The Magic of Parallel Loops

Imagine you’re a teacher with 100 math problems to grade. You could:

Sequential Way: Grade them one by one (slow!)

Problem 1 ✓ → Problem 2 ✓ → Problem 3 ✓ → ... → Problem 100 ✓

Parallel Way: Get 100 assistants to grade them all at once (super fast!)

Problem 1 ✓
Problem 2 ✓  } All happening
Problem 3 ✓  } at the same
...          } time!
Problem 100 ✓

That’s exactly what !$acc parallel loop does to your Fortran DO loops!

Understanding the Basic Structure

Every OpenACC program follows this simple pattern:

! 1. Declare your variables
! 2. Initialize your data
! 3. Add OpenACC directive
! 4. Write your loop
! 5. End the directive

Visual: How OpenACC Transforms Your Loop

Your Original Code:          What OpenACC Does:

do i = 1, 1000               GPU Thread 1: i = 1
  array(i) = i * 2           GPU Thread 2: i = 2
end do                       GPU Thread 3: i = 3
                             ...
                             GPU Thread 1000: i = 1000

                             All running at the same time!

Understanding Compiler Feedback

When you compile with OpenACC, the compiler becomes your helpful assistant. It tells you:

  • ✅ “I successfully parallelized your loop!”
  • ⚠️ “I found some issues, but I’ll try my best”
  • ❌ “I can’t parallelize this – here’s why”

This feedback helps you write better parallel code!

Memory Transfer: The Hidden Magic

When you use OpenACC, something amazing happens behind the scenes:

1. Copy data from CPU memory to GPU memory
2. Run your parallel loop on GPU
3. Copy results back from GPU to CPU memory

It’s like sending homework to a super-fast tutoring center and getting it back completed!

What Makes a Good Parallel Loop?

Your loop should be like a factory assembly line where each worker (GPU thread) can work independently:

Good for Parallelization:

do i = 1, n
  result(i) = input(i) * 2  ! Each iteration is independent
end do

Not Good (for now):

do i = 2, n
  result(i) = result(i-1) + input(i)  ! Each iteration depends on previous
end do

Key Concepts to Remember

  • Parallel Loop: A loop where all iterations can run simultaneously
  • Thread: Think of it as a worker doing one piece of the job
  • Independence: Each loop iteration should not depend on others
  • Compiler Feedback: Messages that help you understand what happened

Example Code

Let us consider the following OpenACC code –

program simple_parallel_loop
  ! Your first real OpenACC program!
  ! This program demonstrates the basic structure of a parallel loop
  
  implicit none
  
  ! Step 1: Declare variables
  integer, parameter :: n = 50000     ! Size of our arrays
  real :: input_array(n)              ! Numbers we start with
  real :: output_array(n)             ! Results after calculation
  integer :: i                        ! Loop counter
  
  ! Step 2: Initialize input data
  write(*,*) 'Setting up input data...'
  do i = 1, n
    input_array(i) = real(i) * 1.5   ! Simple pattern: i * 1.5
  end do
  
  ! Step 3: The magic happens here - parallel computation!
  write(*,*) 'Starting parallel computation on GPU...'
  
  !$acc parallel loop
  do i = 1, n
    ! Each GPU thread calculates one element
    output_array(i) = input_array(i) * input_array(i) + 10.0
  end do
  !$acc end parallel loop
  
  write(*,*) 'Parallel computation completed!'
  
  ! Step 4: Show some results
  write(*,*) 'First 10 results:'
  do i = 1, 10
    write(*,'(A,I0,A,F8.2,A,F8.2)') 'Element ', i, ': ', &
           input_array(i), ' → ', output_array(i)
  end do
  
  write(*,*) 'Success! Your first parallel loop is working!'
  
end program simple_parallel_loop

To compile this code –

nvfortran -acc -o simple_parallel_loop simple_parallel_loop.f90

To execute this code –

./simple_parallel_loop

Sample output –

 Setting up input data...
 Starting parallel computation on GPU...
 Parallel computation completed!
 First 10 results:
Element 1:     1.50 →    12.25
Element 2:     3.00 →    19.00
Element 3:     4.50 →    30.25
Element 4:     6.00 →    46.00
Element 5:     7.50 →    66.25
Element 6:     9.00 →    91.00
Element 7:    10.50 →   120.25
Element 8:    12.00 →   154.00
Element 9:    13.50 →   192.25
Element 10:    15.00 →   235.00
Success! Your first parallel loop is working!

Let us look at another code which shows you how to read compiler feedback –

program compiler_feedback
  ! This program shows you how to read compiler feedback
  ! Run with: nvfortran -acc -Minfo=accel compiler_feedback.f90
  
  implicit none
  
  integer, parameter :: n = 10000
  real :: a(n), b(n), c(n)
  integer :: i
  
  ! Initialize arrays
  do i = 1, n
    a(i) = real(i)
    b(i) = real(i) * 2.0
  end do
  
  ! This loop will give GOOD feedback from compiler
  write(*,*) 'Running simple parallel loop...'
  !$acc parallel loop
  do i = 1, n
    c(i) = a(i) + b(i)  ! Simple, independent calculation
  end do
  !$acc end parallel loop
  
  ! Show a few results
  write(*,*) 'Sample results:'
  do i = 1, 5
    write(*,'(A,I0,A,F6.1)') 'c(', i, ') = ', c(i)
  end do
  
  write(*,*) 'Check compiler messages to see what happened!'
  
end program compiler_feedback

To compile this parallel code –

nvfortran -acc -Minfo=accel -o compiler_feedback compiler_feedback.f90

Sample compilation output (because we passed -Minfo=accel compilation flag, we got following output )-

     19, Generating NVIDIA GPU code
         20, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
     19, Generating implicit copyout(c(:)) [if not already present]
         Generating implicit copyin(b(:),a(:)) [if not already present]

To execute this code –

./compiler_feedback

Sample output –

 Running simple parallel loop...
 Sample results:
c(1) =    3.0
c(2) =    6.0
c(3) =    9.0
c(4) =   12.0
c(5) =   15.0
 Check compiler messages to see what happened!

Click here to go back to OpenACC Fortran tutorials page.

References


Mandar Gurav Avatar

Mandar Gurav

Parallel Programmer, Trainer and Mentor


If you are new to Parallel Programming you can start here.



Beginner CUDA Fortran Hello World Message Passing Interface MPI Nvidia Nsight Systems NVPROF OpenACC OpenACC Fortran OpenMP PGI Fortran Compiler Profiling Vector Addition


Popular Categories