Developer Guide

FPGA Optimization Guide for Intel® oneAPI Toolkits

ID 767853
Date 3/31/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

loop_coalesce Attribute

Use the loop_coalesce attribute to direct the Intel® oneAPI DPC++/C++ Compiler to coalesce nested loops into a single loop without affecting the loop functionality. Coalescing loops can help reduce your kernel area usage by directing the compiler to reduce the overhead needed for loop control.

NOTE:

If you want to use the ivdep attribute to ignore loop-carried dependencies, apply it to the loop that causes dependencies and not to any of the future loops that might appear as a result of the [[intel::loop_coalesce(N)]] attribute.

Syntax

[[intel::loop_coalesce(N)]]

where, the integer argument N specifies the nested loop levels you want the compiler to attempt to coalesce.

For example, consider the following set of nested loops:

for (A)
  for (B)
    for (C)
      for (D)
    for (E)

If you place the loop_coalesce attribute before loop (A), then the loop nesting level for these loops is defined as:

  • Loop (A) has a loop nesting level of 1.
  • Loop (B) has a loop nesting level of 2.
  • Loop (C) has a loop nesting level of 3.
  • Loop (D) has a loop nesting level of 4.
  • Loop (E) has a loop nesting level of 3.

Depending on the loop nesting level that you specify, the compiler attempts to coalesce loops differently:
  • If you specify [[intel::loop_coalesce(1)]] on loop (A), the compiler does not attempt to coalesce any of the nested loops.
  • If you specify [[intel::loop_coalesce(2)]] on loop (A), the compiler attempts to coalesce loops (A) and (B).
  • If you specify [[intel::loop_coalesce(3)]] on loop (A), the compiler attempts to coalesce loops (A), (B), (C), and (E).
  • If you specify [[intel::loop_coalesce(4)]] on loop (A), the compiler attempts to coalesce all of the loops [loop (A) - loop (E)].

Coalescing nested loops also reduce the latency of the component, which could further reduce your kernel area usage. However, in some cases, coalescing loops might lengthen the critical loop initiation interval path, so coalescing loops might not be suitable for all kernels.

For parallel_for kernels, the compiler automatically attempts to coalesce loops even if they are not annotated by the [[intel::loop_coalesce(N)]] attribute. Coalescing loops in parallel_for kernels usually improves throughput as well as reducing kernel area use. You can use the [[intel::loop_coalesce(N)]] attribute to prevent the automatic coalescing of loops in parallel_for kernels.

NOTE:
If you specify [[intel::loop_coalesce(1)]] for a loop in an parallel_for kernel, you prevent automatic loop coalescing for that loop.

Example

The following simple example shows how the compiler coalesces two loops into a single loop.

Consider a simple nested loop written as follows:
[[intel::loop_coalesce(2)]] 
for (int i = 0; i < N; i++)
  for (int j = 0; j < M; j++)
    sum[i][j] += i+j;
The compiler coalesces the two loops together so that they run as if they were a single loop written as follows:
int i = 0;
int j = 0;
while(i < N){
  
  sum[i][j] += i+j;
  j++;

  if (j == M){
    j = 0;
    i++;
  }
}
NOTE:

For additional information, refer to the FPGA tutorial sample loop_coalesce listed in the Intel® oneAPI Samples Browser on Linux* or Windows*, or access the code sample in GitHub.