Your First OpenACC Fortran Program - Parallel Loop on GPU

What You’ll Learn Today

Today you’ll write your very first OpenACC program that actually runs on the GPU! Think of it like learning to ride a bicycle – once you understand the basic structure, everything else becomes easier.

The Magic of Parallel Loops

Imagine you’re a teacher with 100 math problems to grade. You could:

Sequential Way: Grade them one by one (slow!)

Problem 1 ✓ → Problem 2 ✓ → Problem 3 ✓ → ... → Problem 100 ✓

Parallel Way: Get 100 assistants to grade them all at once (super fast!)

Problem 1 ✓
Problem 2 ✓  } All happening
Problem 3 ✓  } at the same
...          } time!
Problem 100 ✓

That’s exactly what !$acc parallel loop does to your Fortran DO loops!

Understanding the Basic Structure

Every OpenACC program follows this simple pattern:

! 1. Declare your variables
! 2. Initialize your data
! 3. Add OpenACC directive
! 4. Write your loop
! 5. End the directive

Visual: How OpenACC Transforms Your Loop

Your Original Code:          What OpenACC Does:

do i = 1, 1000               GPU Thread 1: i = 1
  array(i) = i * 2           GPU Thread 2: i = 2
end do                       GPU Thread 3: i = 3
                             ...
                             GPU Thread 1000: i = 1000

                             All running at the same time!

Understanding Compiler Feedback

When you compile with OpenACC, the compiler becomes your helpful assistant. It tells you:

✅ “I successfully parallelized your loop!”
⚠️ “I found some issues, but I’ll try my best”
❌ “I can’t parallelize this – here’s why”

This feedback helps you write better parallel code!

Memory Transfer: The Hidden Magic

When you use OpenACC, something amazing happens behind the scenes:

1. Copy data from CPU memory to GPU memory
2. Run your parallel loop on GPU
3. Copy results back from GPU to CPU memory

It’s like sending homework to a super-fast tutoring center and getting it back completed!

What Makes a Good Parallel Loop?

Your loop should be like a factory assembly line where each worker (GPU thread) can work independently:

Good for Parallelization:

do i = 1, n
  result(i) = input(i) * 2  ! Each iteration is independent
end do

Not Good (for now):

do i = 2, n
  result(i) = result(i-1) + input(i)  ! Each iteration depends on previous
end do

Key Concepts to Remember

Parallel Loop: A loop where all iterations can run simultaneously
Thread: Think of it as a worker doing one piece of the job
Independence: Each loop iteration should not depend on others
Compiler Feedback: Messages that help you understand what happened

Example Code

Let us consider the following OpenACC code –

program simple_parallel_loop
  ! Your first real OpenACC program!
  ! This program demonstrates the basic structure of a parallel loop
  
  implicit none
  
  ! Step 1: Declare variables
  integer, parameter :: n = 50000     ! Size of our arrays
  real :: input_array(n)              ! Numbers we start with
  real :: output_array(n)             ! Results after calculation
  integer :: i                        ! Loop counter
  
  ! Step 2: Initialize input data
  write(*,*) 'Setting up input data...'
  do i = 1, n
    input_array(i) = real(i) * 1.5   ! Simple pattern: i * 1.5
  end do
  
  ! Step 3: The magic happens here - parallel computation!
  write(*,*) 'Starting parallel computation on GPU...'
  
  !$acc parallel loop
  do i = 1, n
    ! Each GPU thread calculates one element
    output_array(i) = input_array(i) * input_array(i) + 10.0
  end do
  !$acc end parallel loop
  
  write(*,*) 'Parallel computation completed!'
  
  ! Step 4: Show some results
  write(*,*) 'First 10 results:'
  do i = 1, 10
    write(*,'(A,I0,A,F8.2,A,F8.2)') 'Element ', i, ': ', &
           input_array(i), ' → ', output_array(i)
  end do
  
  write(*,*) 'Success! Your first parallel loop is working!'
  
end program simple_parallel_loop

To compile this code –

nvfortran -acc -o simple_parallel_loop simple_parallel_loop.f90

To execute this code –

./simple_parallel_loop

Sample output –

 Setting up input data...
 Starting parallel computation on GPU...
 Parallel computation completed!
 First 10 results:
Element 1:     1.50 →    12.25
Element 2:     3.00 →    19.00
Element 3:     4.50 →    30.25
Element 4:     6.00 →    46.00
Element 5:     7.50 →    66.25
Element 6:     9.00 →    91.00
Element 7:    10.50 →   120.25
Element 8:    12.00 →   154.00
Element 9:    13.50 →   192.25
Element 10:    15.00 →   235.00
Success! Your first parallel loop is working!

Let us look at another code which shows you how to read compiler feedback –

program compiler_feedback
  ! This program shows you how to read compiler feedback
  ! Run with: nvfortran -acc -Minfo=accel compiler_feedback.f90
  
  implicit none
  
  integer, parameter :: n = 10000
  real :: a(n), b(n), c(n)
  integer :: i
  
  ! Initialize arrays
  do i = 1, n
    a(i) = real(i)
    b(i) = real(i) * 2.0
  end do
  
  ! This loop will give GOOD feedback from compiler
  write(*,*) 'Running simple parallel loop...'
  !$acc parallel loop
  do i = 1, n
    c(i) = a(i) + b(i)  ! Simple, independent calculation
  end do
  !$acc end parallel loop
  
  ! Show a few results
  write(*,*) 'Sample results:'
  do i = 1, 5
    write(*,'(A,I0,A,F6.1)') 'c(', i, ') = ', c(i)
  end do
  
  write(*,*) 'Check compiler messages to see what happened!'
  
end program compiler_feedback

To compile this parallel code –

nvfortran -acc -Minfo=accel -o compiler_feedback compiler_feedback.f90

Sample compilation output (because we passed -Minfo=accel compilation flag, we got following output )-

     19, Generating NVIDIA GPU code
         20, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
     19, Generating implicit copyout(c(:)) [if not already present]
         Generating implicit copyin(b(:),a(:)) [if not already present]

To execute this code –

./compiler_feedback

Sample output –

 Running simple parallel loop...
 Sample results:
c(1) =    3.0
c(2) =    6.0
c(3) =    9.0
c(4) =   12.0
c(5) =   15.0
 Check compiler messages to see what happened!

Click here to go back to OpenACC Fortran tutorials page.

References

OpenACC Specification : https://www.openacc.org/specification

Learn Parallel Programming