Introduction to OpenACC with Fortran

โ€”

by

in

What You’ll Learn Today

Imagine you have a huge pile of homework to do. You could do it all by yourself, or you could ask your friends to help. That’s exactly what OpenACC does for your computer programs – it gets help from the GPU (Graphics Processing Unit) to make calculations super fast!

What is OpenACC?

OpenACC is like giving instructions to a very powerful assistant (the GPU) using simple comments in your Fortran code. Think of it as sticky notes that tell the computer: “Hey, this part can be done much faster if you use the GPU!”

Regular Computer (CPU):    ๐Ÿง  โ†’ Does one thing at a time
Graphics Card (GPU):       ๐Ÿง ๐Ÿง ๐Ÿง ๐Ÿง ๐Ÿง ๐Ÿง ๐Ÿง ๐Ÿง  โ†’ Does many things at once

Why Use OpenACC with Fortran?

Fortran is like the grandfather of scientific programming languages. Scientists and engineers have been using it for decades because:

  1. It’s great with numbers – Perfect for math and science calculations
  2. It handles arrays easily – Arrays are like organized lists of numbers
  3. It’s been tested for years – Very reliable and stable

The Magic of GPU Acceleration

Let’s say you want to add 1 to every number in a list of 1000 numbers:

CPU Way (Traditional):

Step 1: Add 1 to number 1
Step 2: Add 1 to number 2  
Step 3: Add 1 to number 3
...
Step 1000: Add 1 to number 1000

GPU Way (OpenACC):

All at once: Add 1 to ALL numbers simultaneously!

It’s like having 1000 people each adding 1 to one number instead of one person doing all 1000 additions!

Understanding Fortran Arrays

In Fortran, we count starting from 1 (not 0 like some other languages):

Array positions:  [1] [2] [3] [4] [5]
Array values:     [10][20][30][40][50]

This is natural for mathematicians because we usually start counting from 1 in real life!

Your First OpenACC Directive

OpenACC directives are special comments that start with !$acc. They’re like secret instructions that only OpenACC understands:

!$acc parallel loop
do i = 1, n
    array(i) = array(i) + 1
end do
!$acc end parallel loop

Think of !$acc parallel loop as saying: “Dear GPU, please help me do this loop really fast!”

Memory Layout: How Fortran Stores Arrays

Fortran stores 2D arrays column by column (column-major), like reading a book from top to bottom, then moving to the next column:

Array(3,2):     Column 1    Column 2
                [1,1] โ†โ”€โ”€โ”€โ”€โ”€[1,2]
                [2,1] โ†โ”€โ”€โ”€โ”€โ”€[2,2]  
                [3,1] โ†โ”€โ”€โ”€โ”€โ”€[3,2]

Memory order: [1,1], [2,1], [3,1], [1,2], [2,2], [3,2]

This is important for performance – accessing elements in this order is much faster!

Visual: CPU vs GPU Processing

CPU Processing (Sequential):
Task โ†’ [โ– ] โ†’ [โ– ] โ†’ [โ– ] โ†’ [โ– ] โ†’ Done
       T1    T2    T3    T4

GPU Processing (Parallel):
Task โ†’ [โ– ][โ– ][โ– ][โ– ] โ†’ Done
       T1 T2 T3 T4 (all at once!)

What’s Next?

In our first example, you’ll see how a simple Fortran program can be transformed to run on a GPU with just one line of OpenACC code. It’s that easy!

Key Terms to Remember

  • OpenACC: A way to accelerate Fortran programs using GPU
  • Directive: Special comments starting with !$acc
  • Parallel: Doing many things at the same time
  • Array: An organized list of numbers
  • Column-major: How Fortran stores 2D arrays in memory

Example Codes

Let us consider the following serial code –

program sequential_example
  ! This program shows how we normally write Fortran code
  ! WITHOUT any GPU acceleration
  
  implicit none
  
  ! Variables
  integer, parameter :: n = 1000000  ! Size of our array
  real :: numbers(n)                 ! Array to store our numbers
  integer :: i                       ! Loop counter
  real :: start_time, end_time       ! To measure how long it takes
  
  ! Fill the array with some numbers
  write(*,*) 'Filling array with numbers...'
  do i = 1, n
    numbers(i) = real(i) * 2.5  ! Each number is i * 2.5
  end do
  
  ! Record start time
  call cpu_time(start_time)
  
  ! Add 10 to each number (this is what we want to speed up!)
  write(*,*) 'Adding 10 to each number using CPU...'
  do i = 1, n
    numbers(i) = numbers(i) + 10.0
  end do
  
  ! Record end time
  call cpu_time(end_time)
  
  ! Show results
  write(*,*) 'First 5 numbers after adding 10:'
  do i = 1, 5
    write(*,*) 'numbers(', i, ') = ', numbers(i)
  end do
  
  write(*,*) 'Time taken by CPU: ', end_time - start_time, ' seconds'
  write(*,*) 'Sequential processing complete!'
  
end program sequential_example

To compile this code –

nvfortran -o sequential_example sequential_example.f90

To execute this code –

./sequential_example

Sample output –

 Filling array with numbers...
 Adding 10 to each number using CPU...
 First 5 numbers after adding 10:
 numbers(            1 ) =     12.50000    
 numbers(            2 ) =     15.00000    
 numbers(            3 ) =     17.50000    
 numbers(            4 ) =     20.00000    
 numbers(            5 ) =     22.50000    
 Time taken by CPU:    2.2349358E-03  seconds
 Sequential processing complete!

An OpenACC based parallelized version of this serial code is as follows –

program openacc_example
  ! This program shows the SAME calculation as sequential_example.f90
  ! but now using OpenACC for GPU acceleration!
  
  implicit none
  
  ! Variables (exactly the same as before)
  integer, parameter :: n = 1000000  ! Size of our array
  real :: numbers(n)                 ! Array to store our numbers
  integer :: i                       ! Loop counter
  real :: start_time, end_time       ! To measure how long it takes
  
  ! Fill the array with some numbers (same as before)
  write(*,*) 'Filling array with numbers...'
  do i = 1, n
    numbers(i) = real(i) * 2.5  ! Each number is i * 2.5
  end do
  
  ! Record start time
  call cpu_time(start_time)
  
  ! Add 10 to each number using GPU acceleration
  ! Notice the special OpenACC directive below!
  write(*,*) 'Adding 10 to each number using GPU...'
  
  !$acc parallel loop
  do i = 1, n
    numbers(i) = numbers(i) + 10.0
  end do
  !$acc end parallel loop
  
  ! Record end time
  call cpu_time(end_time)
  
  ! Show results (same as before)
  write(*,*) 'First 5 numbers after adding 10:'
  do i = 1, 5
    write(*,*) 'numbers(', i, ') = ', numbers(i)
  end do
  
  write(*,*) 'Time taken with GPU: ', end_time - start_time, ' seconds'
  write(*,*) 'OpenACC processing complete!'
  
end program openacc_example

To compile this parallel code (Notice the ‘-acc’ flag enables OpenACC compilation!) –

nvfortran -acc -o openacc_example openacc_example.f90

To execute this code –

./openacc_example

Sample output for parallel code –

 Filling array with numbers...
 Adding 10 to each number using GPU...
 First 5 numbers after adding 10:
 numbers(            1 ) =     12.50000    
 numbers(            2 ) =     15.00000    
 numbers(            3 ) =     17.50000    
 numbers(            4 ) =     20.00000    
 numbers(            5 ) =     22.50000    
 Time taken with GPU:    0.1241460      seconds
 OpenACC processing complete!

Click here to go back to OpenACC Fortran tutorials page.

References