7.9. Loop Fusion

Intel® High Level Synthesis Compiler Pro Edition: Reference Manual

Download PDF

ID 683349

Date 1/23/2025

Version

Public

7.9. Loop Fusion

Loop fusion is a compiler transformation in which two adjacent loops are merged into a single loop over the same index range. This transformation is typically applied to reduce loop overhead and improve run-time performance.

The following example shows the effects of fusing loops in a simple case:

Unfused Loops	Fused Loop
for (j = 0; j < 300; j++) a[j] = a[j] + 3; for (k = 0; k < 300; k++) b[k] = b[k] + 4;	for (f = 0; f < 300; f++) { a[f] = a[f] + 3; b[f] = b[f] + 4; }

The following example shows the effects of fusing loops over a concatenated loop-index range:

Unfused Loops	Fused Loop
for (j = 0; j < 300; j++) a[j] = a[j] + 3; for (k = 0; k < 300; k++) b[k] = b[k] + 4;	for (jk = 0; jk < 600; jk++) { if (jk < 300) { int j = jk; a[j] = a[j] + 3; } else { int k = jk - 300; b[k] = b[k] + 4; } }

Unfused Loops

Fused Loop

for (j = 0; j < 300; j++)
a[j] = a[j] + 3;
for (k = 0; k < 300; k++)
b[k] = b[k] + 4;

for (jk = 0; jk < 600; jk++)
{
    if (jk < 300)
    { 
        int j = jk;
        a[j] = a[j] + 3;
    } else 
    {
        int k = jk - 300;
        b[k] = b[k] + 4;
    }
}

Loop control structures represent a significant overhead. By fusing two loops, the number of control structures needed for the loops is reduced from two to one, reducing this overhead. The main goal of reducing the number of control structures is save FPGA area for your design while still maintaining (ideally increasing) component throughput.

Fusing outer loops introduces parallelism where there was previously none. Combining bodies of two adjacent loops (L_j and L_k) forms a single loop (L_f) with a loop body that spans the bodies of L_j and L_k. This combined loop body creates an opportunity for operations that were serialized across a given iteration of L_j and L_k to execute in parallel In effect, the two loops now execute in lockstep as a single loop, which provides latency improvements.

If inner loops are fused, parallelism is already achieved by pipelined execution of the outer loop iteration. In these cases, the parallelism effect of loop fusion is diminished.

Fusion Criteria

The compiler considers the fusion of two loops (L_j and L_k) to be valid if the loops meet the following criteria:

The loops must be adjacent.
That is, you cannot have a statement S_i with side-effects such that S_i executes after L_j and before L_k.

Each loop must have a single-entry point and a single exit point.
For example, loops that contain break statements are not considered for fusion.

The loops must have no negative-distance dependencies.
That is, for loops L_j and L_k where L_j is defined before L_k, iteration m of loop L_k does not depend on values calculated in iteration m+n (where n>0) of loop L_j.

Automatic Loop Fusion

The Intel® HLS Compiler fuses adjacent loops with equal trip counts automatically if the compiler analysis of your component determines that fusing the loops is profitable.

Examples of where fusing loops is a valid transformation (based on the earlier criteria) but are not considered profitable by the compiler include the following situations:

One of the two loops, but not both, is annotated with the ivdep pragma.
One of the two loops, but not both, contains stall-free logic.

The Loop Analysis Report in the High-Level Design Reports indicates when loops were fused.

In addition to automatic loop fusion, the Intel® HLS Compiler provides two pragmas to help you control when loops are fused:

nofusion pragma
Annotate loops with this pragma to request that the compiler not fuse the annotated loop.
loop_fuse pragma

Override the compiler profitability analysis and fuse adjacent loops provide that it is safe.

Use the the loop_fuse pragma to tell the compiler to consider fusing adjacent loops with different trip counts.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® High Level Synthesis Compiler Pro Edition: Reference Manual

7.9. Loop Fusion

Fusion Criteria

Automatic Loop Fusion