Array Sections and Partial Array Transfers

Think of downloading music – you don’t download the entire album when you only want one song! Similarly, with array sections, you transfer only the data you need to the GPU, not the whole array. This saves memory and makes your programs faster.

Learn Fortran array section syntax, master partial transfers, understand stride notation, and work efficiently with array subsets using Fortran’s powerful notation.

The Array Section Problem

Scientific programs often work with large arrays but only need small portions:

Temperature field:  1,000,000 elements
Only processing:    boundary elements (first and last 100)
Waste:              Transfer 999,800 unused elements!

Visual: Partial vs Full Array Transfer

Full Array Transfer (Wasteful):
Host: [####################################] 100% transferred
       ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
GPU:  [####################################] 
       ↑ ↑                             ↑ ↑
     Used only boundaries, middle unused

Partial Transfer (Smart):
Host: [##]                         [##] Only boundaries
       ↓ ↓                           ↓ ↓
GPU:  [##]                         [##] Perfect fit!

Fortran Array Section Syntax

Fortran provides powerful array section notation using colons:

Basic Section Forms

array(start:end)        ! Elements from start to end
array(start:end:stride) ! Every stride-th element
array(:end)             ! From beginning to end
array(start:)           ! From start to end of array
array(:)                ! Entire array (same as no section)

1-Based Indexing Examples

real :: data(1:100)    ! Array with indices 1 to 100

! Array sections:
data(10:20)     ! Elements 10, 11, 12, ..., 20 (11 elements)
data(1:50:2)    ! Elements 1, 3, 5, 7, ..., 49 (every 2nd)
data(:25)       ! Elements 1, 2, 3, ..., 25 (first 25)
data(75:)       ! Elements 75, 76, ..., 100 (last 26)
data(100:1:-1)  ! Elements 100, 99, 98, ..., 1 (reverse order)

Multi-Dimensional Array Sections

real :: matrix(1:100, 1:200)

! Matrix sections:
matrix(1:50, :)         ! First 50 rows, all columns
matrix(:, 101:200)      ! All rows, columns 101-200
matrix(10:20, 30:40)    ! 11×11 block starting at (10,30)
matrix(::2, ::2)        ! Every 2nd row and column

Visual: Multi-Dimensional Sections

Original Matrix (8×8):
┌─────────────────────────────────┐
│  1   2   3   4   5   6   7   8  │
│  9  10  11  12  13  14  15  16  │
│ 17  18  19  20  21  22  23  24  │
│ 25  26  27  28  29  30  31  32  │
│ 33  34  35  36  37  38  39  40  │
│ 41  42  43  44  45  46  47  48  │
│ 49  50  51  52  53  54  55  56  │
│ 57  58  59  60  61  62  63  64  │
└─────────────────────────────────┘

matrix(3:6, 2:5):
          ┌────────────────┐
          │ 18  19  20  21 │
          │ 26  27  28  29 │
          │ 34  35  36  37 │
          │ 42  43  44  45 │
          └────────────────┘
      Only this 4×4 block transferred!

OpenACC with Array Sections

Basic Example

program boundary_processing
  integer, parameter :: n = 10000
  real :: temperature(n), boundary_flux(200)

  ! Process only first and last 100 elements
  !$acc parallel loop copyin(temperature(1:100)) &
  !$acc              copyout(boundary_flux(1:100))
  do i = 1, 100
    boundary_flux(i) = calculate_flux(temperature(i))
  end do
  !$acc end parallel loop

  !$acc parallel loop copyin(temperature(n-99:n)) &
  !$acc              copyout(boundary_flux(101:200))
  do i = 1, 100
    boundary_flux(100+i) = calculate_flux(temperature(n-100+i))
  end do
  !$acc end parallel loop
end program

Matrix Block Processing

program matrix_blocks
  integer, parameter :: n = 1000, block = 100
  real :: large_matrix(n, n), result_block(block, block)

  ! Process only a 100×100 block in the center
  integer :: center = n/2 - block/2

  !$acc parallel loop collapse(2) &
  !$acc copyin(large_matrix(center:center+block-1, center:center+block-1)) &
  !$acc copyout(result_block)
  do j = 1, block
    do i = 1, block
      result_block(i,j) = large_matrix(center-1+i, center-1+j) * 2.0
    end do
  end do
  !$acc end parallel loop
end program

Common Array Section Patterns

Pattern 1: Boundary Conditions

! Ghost cell updates for finite difference methods
!$acc parallel loop copyin(grid(:, 1)) copyout(left_boundary)
!$acc parallel loop copyin(grid(:, n)) copyout(right_boundary)  
!$acc parallel loop copyin(grid(1, :)) copyout(top_boundary)
!$acc parallel loop copyin(grid(n, :)) copyout(bottom_boundary)

Pattern 2: Downsampling

! Take every 4th element for coarse grid
real :: fine_grid(10000), coarse_grid(2500)

!$acc parallel loop copyin(fine_grid(1::4)) copyout(coarse_grid)
do i = 1, 2500
  coarse_grid(i) = fine_grid(1 + 4*(i-1))
end do
!$acc end parallel loop

Pattern 3: Sliding Window

! Process overlapping windows
integer, parameter :: window_size = 100, overlap = 20
integer :: start_pos

do window = 1, num_windows
  start_pos = 1 + window * (window_size - overlap)

  !$acc parallel loop copyin(data(start_pos:start_pos+window_size-1))
  ! Process window
  !$acc end parallel loop
end do

Memory Layout and Performance

Fortran Column-Major Impact

real :: matrix(1000, 1000)

! Efficient (column-wise, contiguous in memory):
matrix(:, j)          ! Column j
matrix(1:500, j)      ! First 500 elements of column j

! Less efficient (row-wise, strided access):
matrix(i, :)          ! Row i
matrix(i, 1:500)      ! First 500 elements of row i

Visual: Memory Layout

Fortran Column-Major Storage:
matrix(1,1) matrix(2,1) matrix(3,1) ... matrix(1,2) matrix(2,2) ...
[     Column 1 contiguous     ] [     Column 2 contiguous     ]

Column section matrix(:,2): ✓ Fast (sequential memory)
Row section matrix(2,:):    ⚠ Slower (scattered memory)

Performance Comparison

Transfer Size Benefits

Array Size	Section Size	Transfer Reduction
1,000,000	1,000	99.9%
100×100	10×10	99%
1,000	100	90%

Memory Access Patterns

Contiguous sections:    [████████] Sequential → Fast
Strided sections:       [█ █ █ █ ] Scattered  → Slower
Large strides:          [█   █   ] Very scattered → Slowest

Error Prevention

❌ Common Mistakes

! Wrong: 0-based thinking
array(0:9)              ! Invalid in Fortran!

! Wrong: Out of bounds
real :: data(1:100)
data(95:105)            ! Error: goes beyond array bounds

! Wrong: Invalid stride
data(10:1:1)           ! Error: can't go from 10 to 1 with positive stride

✅ Correct Usage

! Right: 1-based indexing
array(1:10)             ! Elements 1 through 10

! Right: Check bounds
real :: data(1:100)
data(95:100)           ! Safe: within array bounds

! Right: Negative stride for reverse
data(10:1:-1)          ! Correct: reverse order with negative stride

Advanced Techniques

Dynamic Array Sections

program dynamic_sections
  integer :: n, start_idx, end_idx
  real, allocatable :: data(:), section_result(:)

  n = get_problem_size()
  allocate(data(n))

  ! Runtime-determined section
  start_idx = n/4
  end_idx = 3*n/4
  allocate(section_result(start_idx:end_idx))

  !$acc parallel loop copyin(data(start_idx:end_idx)) &
  !$acc              copyout(section_result)
  do i = start_idx, end_idx
    section_result(i) = process(data(i))
  end do
  !$acc end parallel loop
end program

Conditional Sections

! Process different sections based on conditions
if (boundary_condition) then
  !$acc parallel loop copyin(grid(1:ghost_width, :))
else
  !$acc parallel loop copyin(grid(:, :))
end if

Best Practices

✅ DO:

Use contiguous sections when possible
Prefer column-wise sections in Fortran
Match section sizes to actual usage
Check array bounds carefully
Use 1-based indexing consistently

❌ DON’T:

Transfer entire arrays when only parts are needed
Use large strides unnecessarily
Forget about column-major memory layout
Mix up 0-based and 1-based indexing
Ignore memory access patterns

Important Note: OpenACC doesn’t directly support strided array sections in data clauses.

Quick Summary

Fortran Array Section Syntax Reference:
┌─────────────────────────┬─────────────────────────┬─────────────────────┐
│ Syntax                  │ Description             │ Example             │
├─────────────────────────┼─────────────────────────┼─────────────────────┤
│ array(start:end)        │ Contiguous section      │ data(10:50)         │
│ array(start:end:stride) │ Strided section         │ data(1:100:5)       │
│ array(:end)             │ From beginning          │ data(:25)           │
│ array(start:)           │ To end                  │ data(75:)           │
│ array(:)                │ Entire array            │ data(:)             │
└─────────────────────────┴─────────────────────────┴─────────────────────┘

Multi-dimensional Array Sections:
• matrix(row_start:row_end, col_start:col_end)  - Rectangular block
• matrix(:, column_number)                      - Entire column
• matrix(row_number, :)                         - Entire row
• matrix(1:50, :)                               - First 50 rows
• matrix(:, 101:200)                            - Columns 101-200

Memory Transfer Efficiency:
┌─────────────────┬─────────────────┬─────────────────┬─────────────────┐
│ Transfer Type   │ Data Movement   │ Memory Usage    │ Performance     │
├─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ Full Array      │ 100% of array   │ High            │ Slower          │
│ Array Section   │ Only needed     │ Optimal         │ Faster          │
│ Boundary Only   │ <10% typical    │ Very low        │ Fastest         │
└─────────────────┴─────────────────┴─────────────────┴─────────────────┘

Performance Characteristics:
┌─────────────────────┬─────────────────────┬─────────────────────┐
│ Access Pattern      │ Memory Layout       │ Efficiency          │
├─────────────────────┼─────────────────────┼─────────────────────┤
│ Contiguous sections │ Sequential          │ Excellent           │
│ Column sections     │ Fortran column-major│ Very Good           │
│ Row sections        │ Strided access      │ Good                │
│ Large strides       │ Scattered memory    │ Moderate            │
└─────────────────────┴─────────────────────┴─────────────────────┘

Common Applications:
• Boundary condition processing (domain edges)
• Domain decomposition (parallel computing)
• Data filtering and downsampling
• Ghost cell updates in finite difference methods
• Iterative solver boundary updates
• Memory-constrained GPU computations

Best Practices:
• Use contiguous sections for best performance
• Prefer column-wise access in Fortran (column-major order)
• Match section sizes to actual computational needs
• Check array bounds carefully with 1-based indexing
• Consider memory layout when designing algorithms

Example Code

Let us consider the following OpenACC code –

program array_sections_demo
  ! Demonstrates Fortran array section syntax and partial array transfers
  
  implicit none
  
  integer, parameter :: n = 8000, boundary_size = 100
  integer, parameter :: downsample_factor = 4
  integer, parameter :: downsampled_size = n / downsample_factor
  integer, parameter :: block_start_row = 50, block_end_row = 99
  integer, parameter :: block_start_col = 100, block_end_col = 159
  integer, parameter :: target_column = 150
  real :: large_array(n), matrix(200, 300)
  real :: boundary_left(boundary_size), boundary_right(boundary_size)
  real :: matrix_block(50, 60), column_section(200)
  real :: downsampled_data(downsampled_size)
  integer :: i, j
  
  write(*,*) 'Array Sections and Partial Array Transfers'
  write(*,*) '========================================='
  write(*,*) ''
  
  ! Initialize test data
  write(*,*) 'Initializing test arrays...'
  do i = 1, n
    large_array(i) = sin(real(i) * 0.01) * 100.0
  end do
  
  do j = 1, 300
    do i = 1, 200
      matrix(i, j) = real(i * j) * 0.001
    end do
  end do
  
  write(*,'(A,I0,A)') 'Large array size: ', n, ' elements'
  write(*,'(A,I0,A,I0,A)') 'Matrix size: ', 200, ' x ', 300, ' elements'
  write(*,*) ''
  
  write(*,*) 'Demonstration 1: Boundary Processing'
  write(*,*) '(Processing only array boundaries with sections)'
  write(*,*) ''
  
  ! Demo 1: Process only boundaries (first and last 100 elements)
  write(*,'(A,I0,A)') '   Processing left boundary: first ', boundary_size, ' elements'
  !$acc parallel loop copyin(large_array(1:boundary_size)) copyout(boundary_left)
  do i = 1, boundary_size
    boundary_left(i) = large_array(i) * 2.0 + 1.0
  end do
  !$acc end parallel loop
  
  write(*,'(A,I0,A)') '   Processing right boundary: last ', boundary_size, ' elements'
  !$acc parallel loop copyin(large_array(n-boundary_size+1:n)) copyout(boundary_right)
  do i = 1, boundary_size
    boundary_right(i) = large_array(n - boundary_size + i) * 3.0 - 0.5
  end do
  !$acc end parallel loop
  
  write(*,'(A,F8.4)') '   Left boundary sample: ', boundary_left(50)
  write(*,'(A,F8.4)') '   Right boundary sample: ', boundary_right(50)
  write(*,'(A,F5.1,A)') '   Memory transfer reduction: ', &
          (1.0 - real(2*boundary_size)/real(n)) * 100.0, '%'
  write(*,*) ''
  
  write(*,*) 'Demonstration 2: Matrix Block Processing'
  write(*,*) '(Working with rectangular matrix sections)'
  write(*,*) ''
  
  ! Demo 2: Process a block from the matrix
  
  write(*,'(A,I0,A,I0,A,I0,A,I0,A)') '   Processing block: rows ', &
         block_start_row, '-', block_end_row, ', cols ', block_start_col, '-', block_end_col
  
  !$acc parallel loop collapse(2) &
  !$acc copyin(matrix(block_start_row:block_end_row, block_start_col:block_end_col)) &
  !$acc copyout(matrix_block)
  do j = 1, 60
    do i = 1, 50
      matrix_block(i, j) = matrix(block_start_row - 1 + i, block_start_col - 1 + j) + 10.0
    end do
  end do
  !$acc end parallel loop
  
  write(*,'(A,F8.4)') '   Matrix block sample: ', matrix_block(25, 30)
  write(*,'(A,F5.1,A)') '   Block size reduction: ', &
          (1.0 - real(50*60)/real(200*300)) * 100.0, '%'
  write(*,*) ''
  
  write(*,*) 'Demonstration 3: Column-wise Processing'
  write(*,*) '(Efficient column-major access pattern)'
  write(*,*) ''
  
  ! Demo 3: Process a single column (efficient for Fortran)
  
  write(*,'(A,I0)') '   Processing column: ', target_column
  !$acc parallel loop copyin(matrix(:, target_column)) copyout(column_section)
  do i = 1, 200
    column_section(i) = matrix(i, target_column) * 0.5 + sin(matrix(i, target_column))
  end do
  !$acc end parallel loop
  
  write(*,'(A,F8.4)') '   Column processing sample: ', column_section(100)
  write(*,*) '   ✓ Column access is cache-friendly in Fortran'
  write(*,*) '   ✓ Sequential memory access pattern'
  write(*,*) ''
  
  write(*,*) 'Demonstration 4: Strided Array Access'
  write(*,*) '(Downsampling with stride patterns)'
  write(*,*) ''
  
  ! Demo 4: Downsampling - process every 4th element
  
  write(*,'(A,I0)') '   Downsampling by factor of: ', downsample_factor
  write(*,'(A,I0,A,I0,A)') '   Reduced from ', n, ' to ', downsampled_size, ' elements'
  
  ! Note: We transfer the full array here because OpenACC doesn't directly support
  ! strided array sections in data clauses, but we only process every 4th element
  !$acc parallel loop copyin(large_array) copyout(downsampled_data)
  do i = 1, downsampled_size
    downsampled_data(i) = large_array(1 + (i-1) * downsample_factor)
  end do
  !$acc end parallel loop
  
  write(*,'(A,F8.4)') '   Downsampled sample: ', downsampled_data(500)
  write(*,*) '   ✓ Conceptual stride: large_array(1::4)'
  write(*,*) ''
  
  write(*,*) 'Array Section Syntax Summary:'
  write(*,*) '============================'
  write(*,*) 'Basic section forms:'
  write(*,*) '• array(start:end)         - Contiguous elements'
  write(*,*) '• array(start:end:stride)  - Every stride-th element'
  write(*,*) '• array(:end)              - From beginning to end'
  write(*,*) '• array(start:)            - From start to array end'
  write(*,*) '• array(:)                 - Entire array'
  write(*,*) ''
  write(*,*) 'Multi-dimensional sections:'
  write(*,*) '• matrix(row_range, col_range) - Rectangular block'
  write(*,*) '• matrix(:, col)               - Entire column (efficient)'
  write(*,*) '• matrix(row, :)               - Entire row (less efficient)'
  write(*,*) ''
  write(*,*) 'Performance Benefits:'
  write(*,*) '• Reduced memory transfers'
  write(*,*) '• Lower GPU memory usage'
  write(*,*) '• Better cache utilization'
  write(*,*) '• Faster host-device communication'
  write(*,*) ''
  write(*,*) 'Common Applications:'
  write(*,*) '• Boundary condition processing'
  write(*,*) '• Domain decomposition'
  write(*,*) '• Data filtering and downsampling'
  write(*,*) '• Iterative solver boundary updates'
  
end program array_sections_demo

To compile this code –

nvfortran -acc -Minfo=accel -O2 array_sections_demo.f90 -o array_sections_demo

To execute this code –

./array_sections_demo

Sample output –

 Array Sections and Partial Array Transfers
 =========================================
 
 Initializing test arrays...
Large array size: 8000 elements
Matrix size: 200 x 300 elements
 
 Demonstration 1: Boundary Processing
 (Processing only array boundaries with sections)
 
   Processing left boundary: first 100 elements
   Processing right boundary: last 100 elements
   Left boundary sample:  96.8851
   Right boundary sample: ********
   Memory transfer reduction:  97.5%
 
 Demonstration 2: Matrix Block Processing
 (Working with rectangular matrix sections)
 
   Processing block: rows 50-99, cols 100-159
   Matrix block sample:  19.5460
   Block size reduction:  95.0%
 
 Demonstration 3: Column-wise Processing
 (Efficient column-major access pattern)
 
   Processing column: 150
   Column processing sample:   8.1503
    ✓ Column access is cache-friendly in Fortran
    ✓ Sequential memory access pattern
 
 Demonstration 4: Strided Array Access
 (Downsampling with stride patterns)
 
   Downsampling by factor of: 4
   Reduced from 8000 to 2000 elements
   Downsampled sample:  90.0294
    ✓ Conceptual stride: large_array(1::4)
 
 Array Section Syntax Summary:
 ============================
 Basic section forms:
 • array(start:end)         - Contiguous elements
 • array(start:end:stride)  - Every stride-th element
 • array(:end)              - From beginning to end
 • array(start:)            - From start to array end
 • array(:)                 - Entire array
 
 Multi-dimensional sections:
 • matrix(row_range, col_range) - Rectangular block
 • matrix(:, col)               - Entire column (efficient)
 • matrix(row, :)               - Entire row (less efficient)
 
 Performance Benefits:
 • Reduced memory transfers
 • Lower GPU memory usage
 • Better cache utilization
 • Faster host-device communication
 
 Common Applications:
 • Boundary condition processing
 • Domain decomposition
 • Data filtering and downsampling
 • Iterative solver boundary updates

Click here to go back to OpenACC Fortran tutorials page.

References

OpenACC Specification : https://www.openacc.org/specification

Learn Parallel Programming