Data Regions with Fortran Arrays

Think of data regions like setting up a temporary office space. Instead of carrying your laptop back and forth for every small task, you set up once, do multiple tasks, then pack up at the end. That’s exactly what OpenACC data regions do for your arrays!

The Power of Persistent Data

Without Data Regions (inefficient):

Operation 1: CPU → GPU → compute → GPU → CPU
Operation 2: CPU → GPU → compute → GPU → CPU  
Operation 3: CPU → GPU → compute → GPU → CPU

With Data Regions (efficient):

Setup:    CPU → GPU
Operation 1:     compute
Operation 2:     compute  
Operation 3:     compute
Cleanup:         GPU → CPU

Basic Data Region Syntax

!$acc data copyin(input_arrays) copyout(output_arrays)
  ! Multiple operations here - arrays stay on GPU
  !$acc parallel loop
  ! ... first computation ...
  !$acc end parallel loop

  !$acc parallel loop  
  ! ... second computation ...
  !$acc end parallel loop
!$acc end data

The arrays live on GPU for the entire region!

Visual: Array Lifetime in Data Regions

Time →
┌─────────────────────────────────────────────────┐
│ Data Region Scope                               │
│                                                 │
│ CPU: [array] ──→ GPU: [array] stays here        │
│                       ↓                         │
│                    compute 1                    │
│                       ↓                         │  
│                    compute 2                    │
│                       ↓                         │
│                    compute 3                    │
│                       ↓                         │
│ CPU: [result] ←── GPU: [result]                 │
└─────────────────────────────────────────────────┘

Perfect for Iterative Algorithms

!$acc data copy(solution(1:n)) copyin(problem_data(1:n))
  do iteration = 1, max_iterations
    !$acc parallel loop
    do i = 1, n
      solution(i) = update_formula(solution(i), problem_data(i))
    end do
    !$acc end parallel loop

    ! Check convergence, adjust parameters, etc.
  end do
!$acc end data

The arrays stay on GPU through all iterations – no repeated transfers!

Memory Management Benefits

Traditional approach:

  • Transfer time: 100ms per operation × 10 operations = 1000ms
  • Compute time: 50ms per operation × 10 operations = 500ms
  • Total: 1500ms

Data region approach:

  • Transfer time: 100ms once = 100ms
  • Compute time: 50ms per operation × 10 operations = 500ms
  • Total: 600ms (2.5× faster!)

Key Concepts

  • Data Region: Scope where arrays live on GPU
  • Persistent Arrays: Stay on GPU between operations
  • Reduced Transfers: Move data once, compute many times
  • Perfect for: Iterative algorithms, multi-step processes

Quick Summary

Data Region Benefits:
• Persistent device memory across multiple kernels
• Reduced memory transfer overhead
• Better performance for iterative algorithms
• Explicit control over data lifetime

Syntax:
! data copyin(input) copyout(output) create(temp)
  ! Multiple parallel regions here
! end data

Best Practices:
• Use for multi-kernel operations
• Minimize data region scope
• Combine with appropriate data clauses

Example Code

Let us consider the following OpenACC code –

program data_regions_demo
  ! Demonstrates data regions for iterative algorithms
  
  implicit none
  
  integer, parameter :: n = 10000, max_iter = 100
  real :: solution(n), rhs(n), residual(n)
  real :: tolerance = 1e-6
  integer :: iter, i
  real :: error_norm
  
  write(*,*) 'Iterative Solver with Data Regions'
  write(*,*) '=================================='
  write(*,*) ''
  
  ! Initialize problem
  do i = 1, n
    rhs(i) = sin(real(i) * 0.01)  ! Right-hand side
    solution(i) = 0.0             ! Initial guess
  end do
  
  write(*,*) 'Starting iterative solve...'
  write(*,*) '(Arrays will stay on GPU throughout iteration)'
  
  ! Data region: arrays live on GPU for entire solve
  !$acc data copy(solution(1:n)) copyin(rhs(1:n)) create(residual(1:n))
    
    do iter = 1, max_iter
      ! Update solution (Jacobi iteration)
      !$acc parallel loop
      do i = 2, n-1
        solution(i) = 0.5 * (solution(i-1) + solution(i+1) + rhs(i))
      end do
      !$acc end parallel loop
      
      ! Compute residual for convergence check
      if (mod(iter, 10) == 0) then  ! Check every 10 iterations
        !$acc parallel loop
        do i = 2, n-1
          residual(i) = rhs(i) - (solution(i-1) - 2*solution(i) + solution(i+1))
        end do
        !$acc end parallel loop
        
        ! Compute norm (simplified - just check a few elements)
        !$acc update host(residual(1:10))
        error_norm = 0.0
        do i = 1, 10
          error_norm = error_norm + abs(residual(i))
        end do
        error_norm = error_norm / 10.0
        
        write(*,'(A,I0,A,E12.4)') 'Iteration ', iter, ', error: ', error_norm
        
        if (error_norm < tolerance) exit
      end if
    end do
    
  !$acc end data
  
  write(*,*) ''
  if (iter <= max_iter) then
    write(*,'(A,I0,A)') '✓ Converged in ', iter, ' iterations'
  else
    write(*,*) '⚠ Maximum iterations reached'
  end if
  
  ! Show sample solution
  write(*,*) ''
  write(*,*) 'Sample solution values:'
  do i = 1, 5
    write(*,'(A,I0,A,F8.4)') 'solution(', i, ') = ', solution(i)
  end do
  
  write(*,*) ''
  write(*,*) 'Benefits of data region:'
  write(*,*) '• solution, rhs, residual stayed on GPU'
  write(*,'(A,I0,A)') '• No data transfers during ', iter, ' iterations'
  write(*,*) '• Only small residual updates for convergence check'
  write(*,*) '• Massive speedup compared to per-iteration transfers!'
  
end program data_regions_demo

To compile this code –

nvfortran -acc -gpu=managed -Minfo=accel -O2 data_regions_demo.f90 -o data_regions_demo

To execute this code –

./data_regions_demo

Sample output –

 Iterative Solver with Data Regions
 ==================================
 
 Starting iterative solve...
 (Arrays will stay on GPU throughout iteration)
Iteration 10, error:   0.5606E-01
Iteration 20, error:   0.5706E-01
Iteration 30, error:   0.5775E-01
Iteration 40, error:   0.5826E-01
Iteration 50, error:   0.5866E-01
Iteration 60, error:   0.5898E-01
Iteration 70, error:   0.5925E-01
Iteration 80, error:   0.5948E-01
Iteration 90, error:   0.5968E-01
Iteration 100, error:   0.5986E-01
 
 ⚠ Maximum iterations reached
 
 Sample solution values:
solution(1) =   0.0000
solution(2) =   0.5735
solution(3) =   1.1376
solution(4) =   1.6933
solution(5) =   2.2411
 
 Benefits of data region:
 • solution, rhs, residual stayed on GPU
 • No data transfers during 101 iterations
 • Only small residual updates for convergence check
 • Massive speedup compared to per-iteration transfers!

Click here to go back to OpenACC Fortran tutorials page.

References


Mandar Gurav Avatar

Mandar Gurav

Parallel Programmer, Trainer and Mentor


If you are new to Parallel Programming you can start here.



Beginner CUDA Fortran Hello World Message Passing Interface MPI Nvidia Nsight Systems NVPROF OpenACC OpenACC Fortran OpenMP PGI Fortran Compiler Profiling Vector Addition


Popular Categories