What You’ll Learn Today
Imagine you have a huge pile of homework to do. You could do it all by yourself, or you could ask your friends to help. That’s exactly what OpenACC does for your computer programs – it gets help from the GPU (Graphics Processing Unit) to make calculations super fast!
What is OpenACC?
OpenACC is like giving instructions to a very powerful assistant (the GPU) using simple comments in your Fortran code. Think of it as sticky notes that tell the computer: “Hey, this part can be done much faster if you use the GPU!”
Regular Computer (CPU): ๐ง โ Does one thing at a time
Graphics Card (GPU): ๐ง ๐ง ๐ง ๐ง ๐ง ๐ง ๐ง ๐ง โ Does many things at once
Why Use OpenACC with Fortran?
Fortran is like the grandfather of scientific programming languages. Scientists and engineers have been using it for decades because:
- It’s great with numbers – Perfect for math and science calculations
- It handles arrays easily – Arrays are like organized lists of numbers
- It’s been tested for years – Very reliable and stable
The Magic of GPU Acceleration
Let’s say you want to add 1 to every number in a list of 1000 numbers:
CPU Way (Traditional):
Step 1: Add 1 to number 1
Step 2: Add 1 to number 2
Step 3: Add 1 to number 3
...
Step 1000: Add 1 to number 1000
GPU Way (OpenACC):
All at once: Add 1 to ALL numbers simultaneously!
It’s like having 1000 people each adding 1 to one number instead of one person doing all 1000 additions!
Understanding Fortran Arrays
In Fortran, we count starting from 1 (not 0 like some other languages):
Array positions: [1] [2] [3] [4] [5]
Array values: [10][20][30][40][50]
This is natural for mathematicians because we usually start counting from 1 in real life!
Your First OpenACC Directive
OpenACC directives are special comments that start with !$acc
. They’re like secret instructions that only OpenACC understands:
!$acc parallel loop
do i = 1, n
array(i) = array(i) + 1
end do
!$acc end parallel loop
Think of !$acc parallel loop
as saying: “Dear GPU, please help me do this loop really fast!”
Memory Layout: How Fortran Stores Arrays
Fortran stores 2D arrays column by column (column-major), like reading a book from top to bottom, then moving to the next column:
Array(3,2): Column 1 Column 2
[1,1] โโโโโโ[1,2]
[2,1] โโโโโโ[2,2]
[3,1] โโโโโโ[3,2]
Memory order: [1,1], [2,1], [3,1], [1,2], [2,2], [3,2]
This is important for performance – accessing elements in this order is much faster!
Visual: CPU vs GPU Processing
CPU Processing (Sequential):
Task โ [โ ] โ [โ ] โ [โ ] โ [โ ] โ Done
T1 T2 T3 T4
GPU Processing (Parallel):
Task โ [โ ][โ ][โ ][โ ] โ Done
T1 T2 T3 T4 (all at once!)
What’s Next?
In our first example, you’ll see how a simple Fortran program can be transformed to run on a GPU with just one line of OpenACC code. It’s that easy!
Key Terms to Remember
- OpenACC: A way to accelerate Fortran programs using GPU
- Directive: Special comments starting with
!$acc
- Parallel: Doing many things at the same time
- Array: An organized list of numbers
- Column-major: How Fortran stores 2D arrays in memory
Example Codes
Let us consider the following serial code –
program sequential_example
! This program shows how we normally write Fortran code
! WITHOUT any GPU acceleration
implicit none
! Variables
integer, parameter :: n = 1000000 ! Size of our array
real :: numbers(n) ! Array to store our numbers
integer :: i ! Loop counter
real :: start_time, end_time ! To measure how long it takes
! Fill the array with some numbers
write(*,*) 'Filling array with numbers...'
do i = 1, n
numbers(i) = real(i) * 2.5 ! Each number is i * 2.5
end do
! Record start time
call cpu_time(start_time)
! Add 10 to each number (this is what we want to speed up!)
write(*,*) 'Adding 10 to each number using CPU...'
do i = 1, n
numbers(i) = numbers(i) + 10.0
end do
! Record end time
call cpu_time(end_time)
! Show results
write(*,*) 'First 5 numbers after adding 10:'
do i = 1, 5
write(*,*) 'numbers(', i, ') = ', numbers(i)
end do
write(*,*) 'Time taken by CPU: ', end_time - start_time, ' seconds'
write(*,*) 'Sequential processing complete!'
end program sequential_example
To compile this code –
nvfortran -o sequential_example sequential_example.f90
To execute this code –
./sequential_example
Sample output –
Filling array with numbers...
Adding 10 to each number using CPU...
First 5 numbers after adding 10:
numbers( 1 ) = 12.50000
numbers( 2 ) = 15.00000
numbers( 3 ) = 17.50000
numbers( 4 ) = 20.00000
numbers( 5 ) = 22.50000
Time taken by CPU: 2.2349358E-03 seconds
Sequential processing complete!
An OpenACC based parallelized version of this serial code is as follows –
program openacc_example
! This program shows the SAME calculation as sequential_example.f90
! but now using OpenACC for GPU acceleration!
implicit none
! Variables (exactly the same as before)
integer, parameter :: n = 1000000 ! Size of our array
real :: numbers(n) ! Array to store our numbers
integer :: i ! Loop counter
real :: start_time, end_time ! To measure how long it takes
! Fill the array with some numbers (same as before)
write(*,*) 'Filling array with numbers...'
do i = 1, n
numbers(i) = real(i) * 2.5 ! Each number is i * 2.5
end do
! Record start time
call cpu_time(start_time)
! Add 10 to each number using GPU acceleration
! Notice the special OpenACC directive below!
write(*,*) 'Adding 10 to each number using GPU...'
!$acc parallel loop
do i = 1, n
numbers(i) = numbers(i) + 10.0
end do
!$acc end parallel loop
! Record end time
call cpu_time(end_time)
! Show results (same as before)
write(*,*) 'First 5 numbers after adding 10:'
do i = 1, 5
write(*,*) 'numbers(', i, ') = ', numbers(i)
end do
write(*,*) 'Time taken with GPU: ', end_time - start_time, ' seconds'
write(*,*) 'OpenACC processing complete!'
end program openacc_example
To compile this parallel code (Notice the ‘-acc’ flag enables OpenACC compilation!) –
nvfortran -acc -o openacc_example openacc_example.f90
To execute this code –
./openacc_example
Sample output for parallel code –
Filling array with numbers...
Adding 10 to each number using GPU...
First 5 numbers after adding 10:
numbers( 1 ) = 12.50000
numbers( 2 ) = 15.00000
numbers( 3 ) = 17.50000
numbers( 4 ) = 20.00000
numbers( 5 ) = 22.50000
Time taken with GPU: 0.1241460 seconds
OpenACC processing complete!
Click here to go back to OpenACC Fortran tutorials page.
References
- OpenACC Specification : https://www.openacc.org/specification