Intel® High Level Synthesis Compiler Pro Edition: Reference Manual

ID 683349
Date 4/01/2024
Public
Document Table of Contents

6.9.1. Loop Fusion Control (loop_fuse Pragma)

Use the loop_fuse pragma to tell the compiler to try to fuse two adjacent loops without affecting the functionality of either loop, overriding the compiler profitability analysis of fusing the loops.

Fusing adjacent loops helps reduce the amount of loop control overhead in your component, which helps reduce the FPGA area used and can increase the performance by executing both loops as one (fused) loop.

Apply the loop_fuse pragma to a block of code to indicate that loops in the code block should be considered for fusing as follows:
#pragma loop_fuse [depth(N)] [independent]
  {
  ...
  }
By default, only adjacent top-level (not nested) loops are considered for fusing. Use the depth(N) clause of the pragma to indicate the number of nesting depths the compiler should consider when fusing adjacent loops. Specifying depth(1) is equivalent to indicating that only adjacent top-level loops should be considered for fusing.
#pragma loop_fuse
 // can also be 
// #pragma loop_fuse depth(1)
{
  L1: for(...) {}
  L2: for(...) {
    L3: for(...) {}
    L4: for(...) {
      L5: for(...) {}
      L6: for(...) {}
    }
  } 
}

By default (or depth(1)), only loops L1 and L2 are initially considered for fusing.

#pragma loop_fuse depth(2)
{
  L1: for(...) {}
  L2: for(...) {
    L3: for(...) {}
    L4: for(...) {
      L5: for(...) {}
      L6: for(...) {}
    }
  }
}
With depth(2), the following loop pairs are initially considered for fusing:
  • L1 and L2
  • L3 and L4
#pragma loop_fuse depth(3)
{
  L1: for(...) {}
  L2: for(...) {
    L3: for(...) {}
    L4: for(...) {
      L5: for(...) {}
      L6: for(...) {}
    }
  }
}
With depth(3), the following loop pairs are initially considered for fusing:
  • L1 and L2
  • L3 and L4
  • L5 and L6

The compiler automatically considers fusing adjacent loops with equal trip counts when the loops meet the criteria. You can also use the loop_fuse pragma to tell the compiler to consider fusing adjacent loops with different trip counts.

With the loop_fuse pragma applied to a block of code, the compiler always tries to fuse adjacent loops (with equal or different trip counts) in the block whenever the compiler determines that it is safe to fuse the loops. Two loops are considered safe to merge if they meet the fusion criteria described in Fusion Criteria section of Loop Fusion.

The following example shows the effects of fusing loops with unequal trip counts:
Unfused Loops Fused Loop
#pragma loop_fuse
{
	for (int i = 0; i < N; i++) {
  		// Loop Body 1
	}
	for (int j = 0; j < M; j++) {
  		// Loop Body 2
	}
}
for (int f = 0; f < max(M,N); f++) {
  if (f < N) {
    // Loop Body 1
  }
  if (f < M) {
    // Loop Body 2
  }
}
A fused loop can itself be considered for fusing with other loops. For example, in the following code L1 and L2 are initially considered for fusing. That resulting fused loop can then be considered for fusing with L4.
#pragma loop_fuse
{
  L1 for(...) {}
  L2 for(...) {
    L3 for(...) {}
  }
  L4 for(...) {}
}

Use the independent option to override the dependency safety checks. If you specify the independent option, you are guaranteeing to the compiler that fusing pairs of loops affected by the loop_fuse pragma is safe. That is, there are no negative-distance dependencies in the fused loop. If it is not safe, you might get functional errors in your component.

Function Calls In loop_fuse Code Blocks

If a function call occurs in a code block annotated with the loop_fuse pragma and inlining that function call contains a loop, the resulting loop can be a candidate for loop fusion.

Nested depth(N) Clauses

When you nest loop_fuse pragmas, you might create overlapping sets of candidates loops.

Consider the following example:
#pragma loop_fuse depth(2) independent
{
  L1: for(...) {}
  L2: for(...) {
    #pragma loop_fuse depth(2)
    {
      L3: for(...) {}
      L4: for(...) {
        L5: for(...) {}
        L6: for(...) {}
      }
    }
  }
}
In this example, the compiler considers the following loop pairs for fusion: L1/L2, L3/L4, and L5/L6. In addition, the compiler overrides the compiler negative-distance dependency analysis of the following loops pairs: L1/L2, L3/L4.
For another example of the effects of using the loop_fuse pragma, refer to the following tutorial:
<quartus_installdir>/hls/examples/tutorials/best_practices/loop_fusion