Think of data regions like setting up a temporary office space. Instead of carrying your laptop back and forth for every small task, you set up once, do multiple tasks, then pack up at the end. That’s exactly what OpenACC data regions do for your arrays!
The Power of Persistent Data
Without Data Regions (inefficient):
Operation 1: CPU → GPU → compute → GPU → CPU
Operation 2: CPU → GPU → compute → GPU → CPU
Operation 3: CPU → GPU → compute → GPU → CPU
With Data Regions (efficient):
Setup: CPU → GPU
Operation 1: compute
Operation 2: compute
Operation 3: compute
Cleanup: GPU → CPU
Basic Data Region Syntax
!$acc data copyin(input_arrays) copyout(output_arrays)
! Multiple operations here - arrays stay on GPU
!$acc parallel loop
! ... first computation ...
!$acc end parallel loop
!$acc parallel loop
! ... second computation ...
!$acc end parallel loop
!$acc end data
The arrays live on GPU for the entire region!
Visual: Array Lifetime in Data Regions
Time →
┌─────────────────────────────────────────────────┐
│ Data Region Scope │
│ │
│ CPU: [array] ──→ GPU: [array] stays here │
│ ↓ │
│ compute 1 │
│ ↓ │
│ compute 2 │
│ ↓ │
│ compute 3 │
│ ↓ │
│ CPU: [result] ←── GPU: [result] │
└─────────────────────────────────────────────────┘
Perfect for Iterative Algorithms
!$acc data copy(solution(1:n)) copyin(problem_data(1:n))
do iteration = 1, max_iterations
!$acc parallel loop
do i = 1, n
solution(i) = update_formula(solution(i), problem_data(i))
end do
!$acc end parallel loop
! Check convergence, adjust parameters, etc.
end do
!$acc end data
The arrays stay on GPU through all iterations – no repeated transfers!
Memory Management Benefits
Traditional approach:
- Transfer time: 100ms per operation × 10 operations = 1000ms
- Compute time: 50ms per operation × 10 operations = 500ms
- Total: 1500ms
Data region approach:
- Transfer time: 100ms once = 100ms
- Compute time: 50ms per operation × 10 operations = 500ms
- Total: 600ms (2.5× faster!)
Key Concepts
- Data Region: Scope where arrays live on GPU
- Persistent Arrays: Stay on GPU between operations
- Reduced Transfers: Move data once, compute many times
- Perfect for: Iterative algorithms, multi-step processes
Quick Summary
Data Region Benefits:
• Persistent device memory across multiple kernels
• Reduced memory transfer overhead
• Better performance for iterative algorithms
• Explicit control over data lifetime
Syntax:
! data copyin(input) copyout(output) create(temp)
! Multiple parallel regions here
! end data
Best Practices:
• Use for multi-kernel operations
• Minimize data region scope
• Combine with appropriate data clauses
Example Code
Let us consider the following OpenACC code –
program data_regions_demo
! Demonstrates data regions for iterative algorithms
implicit none
integer, parameter :: n = 10000, max_iter = 100
real :: solution(n), rhs(n), residual(n)
real :: tolerance = 1e-6
integer :: iter, i
real :: error_norm
write(*,*) 'Iterative Solver with Data Regions'
write(*,*) '=================================='
write(*,*) ''
! Initialize problem
do i = 1, n
rhs(i) = sin(real(i) * 0.01) ! Right-hand side
solution(i) = 0.0 ! Initial guess
end do
write(*,*) 'Starting iterative solve...'
write(*,*) '(Arrays will stay on GPU throughout iteration)'
! Data region: arrays live on GPU for entire solve
!$acc data copy(solution(1:n)) copyin(rhs(1:n)) create(residual(1:n))
do iter = 1, max_iter
! Update solution (Jacobi iteration)
!$acc parallel loop
do i = 2, n-1
solution(i) = 0.5 * (solution(i-1) + solution(i+1) + rhs(i))
end do
!$acc end parallel loop
! Compute residual for convergence check
if (mod(iter, 10) == 0) then ! Check every 10 iterations
!$acc parallel loop
do i = 2, n-1
residual(i) = rhs(i) - (solution(i-1) - 2*solution(i) + solution(i+1))
end do
!$acc end parallel loop
! Compute norm (simplified - just check a few elements)
!$acc update host(residual(1:10))
error_norm = 0.0
do i = 1, 10
error_norm = error_norm + abs(residual(i))
end do
error_norm = error_norm / 10.0
write(*,'(A,I0,A,E12.4)') 'Iteration ', iter, ', error: ', error_norm
if (error_norm < tolerance) exit
end if
end do
!$acc end data
write(*,*) ''
if (iter <= max_iter) then
write(*,'(A,I0,A)') '✓ Converged in ', iter, ' iterations'
else
write(*,*) '⚠ Maximum iterations reached'
end if
! Show sample solution
write(*,*) ''
write(*,*) 'Sample solution values:'
do i = 1, 5
write(*,'(A,I0,A,F8.4)') 'solution(', i, ') = ', solution(i)
end do
write(*,*) ''
write(*,*) 'Benefits of data region:'
write(*,*) '• solution, rhs, residual stayed on GPU'
write(*,'(A,I0,A)') '• No data transfers during ', iter, ' iterations'
write(*,*) '• Only small residual updates for convergence check'
write(*,*) '• Massive speedup compared to per-iteration transfers!'
end program data_regions_demo
To compile this code –
nvfortran -acc -gpu=managed -Minfo=accel -O2 data_regions_demo.f90 -o data_regions_demo
To execute this code –
./data_regions_demo
Sample output –
Iterative Solver with Data Regions
==================================
Starting iterative solve...
(Arrays will stay on GPU throughout iteration)
Iteration 10, error: 0.5606E-01
Iteration 20, error: 0.5706E-01
Iteration 30, error: 0.5775E-01
Iteration 40, error: 0.5826E-01
Iteration 50, error: 0.5866E-01
Iteration 60, error: 0.5898E-01
Iteration 70, error: 0.5925E-01
Iteration 80, error: 0.5948E-01
Iteration 90, error: 0.5968E-01
Iteration 100, error: 0.5986E-01
⚠ Maximum iterations reached
Sample solution values:
solution(1) = 0.0000
solution(2) = 0.5735
solution(3) = 1.1376
solution(4) = 1.6933
solution(5) = 2.2411
Benefits of data region:
• solution, rhs, residual stayed on GPU
• No data transfers during 101 iterations
• Only small residual updates for convergence check
• Massive speedup compared to per-iteration transfers!
Click here to go back to OpenACC Fortran tutorials page.
References
- OpenACC Specification : https://www.openacc.org/specification