What You’ll Learn Today
Today you’ll write your very first OpenACC program that actually runs on the GPU! Think of it like learning to ride a bicycle – once you understand the basic structure, everything else becomes easier.
The Magic of Parallel Loops
Imagine you’re a teacher with 100 math problems to grade. You could:
Sequential Way: Grade them one by one (slow!)
Problem 1 ✓ → Problem 2 ✓ → Problem 3 ✓ → ... → Problem 100 ✓
Parallel Way: Get 100 assistants to grade them all at once (super fast!)
Problem 1 ✓
Problem 2 ✓ } All happening
Problem 3 ✓ } at the same
... } time!
Problem 100 ✓
That’s exactly what !$acc parallel loop
does to your Fortran DO loops!
Understanding the Basic Structure
Every OpenACC program follows this simple pattern:
! 1. Declare your variables
! 2. Initialize your data
! 3. Add OpenACC directive
! 4. Write your loop
! 5. End the directive
Visual: How OpenACC Transforms Your Loop
Your Original Code: What OpenACC Does:
do i = 1, 1000 GPU Thread 1: i = 1
array(i) = i * 2 GPU Thread 2: i = 2
end do GPU Thread 3: i = 3
...
GPU Thread 1000: i = 1000
All running at the same time!
Understanding Compiler Feedback
When you compile with OpenACC, the compiler becomes your helpful assistant. It tells you:
- ✅ “I successfully parallelized your loop!”
- ⚠️ “I found some issues, but I’ll try my best”
- ❌ “I can’t parallelize this – here’s why”
This feedback helps you write better parallel code!
Memory Transfer: The Hidden Magic
When you use OpenACC, something amazing happens behind the scenes:
1. Copy data from CPU memory to GPU memory
2. Run your parallel loop on GPU
3. Copy results back from GPU to CPU memory
It’s like sending homework to a super-fast tutoring center and getting it back completed!
What Makes a Good Parallel Loop?
Your loop should be like a factory assembly line where each worker (GPU thread) can work independently:
Good for Parallelization:
do i = 1, n
result(i) = input(i) * 2 ! Each iteration is independent
end do
Not Good (for now):
do i = 2, n
result(i) = result(i-1) + input(i) ! Each iteration depends on previous
end do
Key Concepts to Remember
- Parallel Loop: A loop where all iterations can run simultaneously
- Thread: Think of it as a worker doing one piece of the job
- Independence: Each loop iteration should not depend on others
- Compiler Feedback: Messages that help you understand what happened
Example Code
Let us consider the following OpenACC code –
program simple_parallel_loop
! Your first real OpenACC program!
! This program demonstrates the basic structure of a parallel loop
implicit none
! Step 1: Declare variables
integer, parameter :: n = 50000 ! Size of our arrays
real :: input_array(n) ! Numbers we start with
real :: output_array(n) ! Results after calculation
integer :: i ! Loop counter
! Step 2: Initialize input data
write(*,*) 'Setting up input data...'
do i = 1, n
input_array(i) = real(i) * 1.5 ! Simple pattern: i * 1.5
end do
! Step 3: The magic happens here - parallel computation!
write(*,*) 'Starting parallel computation on GPU...'
!$acc parallel loop
do i = 1, n
! Each GPU thread calculates one element
output_array(i) = input_array(i) * input_array(i) + 10.0
end do
!$acc end parallel loop
write(*,*) 'Parallel computation completed!'
! Step 4: Show some results
write(*,*) 'First 10 results:'
do i = 1, 10
write(*,'(A,I0,A,F8.2,A,F8.2)') 'Element ', i, ': ', &
input_array(i), ' → ', output_array(i)
end do
write(*,*) 'Success! Your first parallel loop is working!'
end program simple_parallel_loop
To compile this code –
nvfortran -acc -o simple_parallel_loop simple_parallel_loop.f90
To execute this code –
./simple_parallel_loop
Sample output –
Setting up input data...
Starting parallel computation on GPU...
Parallel computation completed!
First 10 results:
Element 1: 1.50 → 12.25
Element 2: 3.00 → 19.00
Element 3: 4.50 → 30.25
Element 4: 6.00 → 46.00
Element 5: 7.50 → 66.25
Element 6: 9.00 → 91.00
Element 7: 10.50 → 120.25
Element 8: 12.00 → 154.00
Element 9: 13.50 → 192.25
Element 10: 15.00 → 235.00
Success! Your first parallel loop is working!
Let us look at another code which shows you how to read compiler feedback –
program compiler_feedback
! This program shows you how to read compiler feedback
! Run with: nvfortran -acc -Minfo=accel compiler_feedback.f90
implicit none
integer, parameter :: n = 10000
real :: a(n), b(n), c(n)
integer :: i
! Initialize arrays
do i = 1, n
a(i) = real(i)
b(i) = real(i) * 2.0
end do
! This loop will give GOOD feedback from compiler
write(*,*) 'Running simple parallel loop...'
!$acc parallel loop
do i = 1, n
c(i) = a(i) + b(i) ! Simple, independent calculation
end do
!$acc end parallel loop
! Show a few results
write(*,*) 'Sample results:'
do i = 1, 5
write(*,'(A,I0,A,F6.1)') 'c(', i, ') = ', c(i)
end do
write(*,*) 'Check compiler messages to see what happened!'
end program compiler_feedback
To compile this parallel code –
nvfortran -acc -Minfo=accel -o compiler_feedback compiler_feedback.f90
Sample compilation output (because we passed -Minfo=accel compilation flag, we got following output )-
19, Generating NVIDIA GPU code
20, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
19, Generating implicit copyout(c(:)) [if not already present]
Generating implicit copyin(b(:),a(:)) [if not already present]
To execute this code –
./compiler_feedback
Sample output –
Running simple parallel loop...
Sample results:
c(1) = 3.0
c(2) = 6.0
c(3) = 9.0
c(4) = 12.0
c(5) = 15.0
Check compiler messages to see what happened!
Click here to go back to OpenACC Fortran tutorials page.
References
- OpenACC Specification : https://www.openacc.org/specification