3.3.3.1.2.1. Pipelining Loops Within A Component

Intel® High Level Synthesis Compiler Pro Edition: Best Practices Guide

Download PDF

ID 683152

Date 6/02/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

3.3.3.1.2.1. Pipelining Loops Within A Component

Within a component, loops are the primary source of pipeline parallelism.

When the Intel® HLS Compiler pipelines a loop, it attempts to schedule the loop execution such that the next iteration of the loop enters the pipeline before the previous iteration has completed. This pipelining of loop iterations can lead to higher throughput.

The number of clock cycles between iterations of the loop is called the Initiation Interval (II).

For the highest performance, a loop iteration would start every clock cycle, which corresponds to an II of 1.

Data dependencies that are carried from one loop iteration to another can affect the ability to achieve II of 1. These dependencies are called loop-carried dependencies.

The II of a loop must be high enough to accommodate all loop carried dependencies.

Tip: The II required to satisfy this constraint is a function of the f_MAX of the design. If the f_MAX is lower, the II might also be lower. Conversely, if the f_MAX is higher, a higher II might be required.

The Intel® HLS Compiler automatically identifies these dependencies and tries to build hardware to resolve them while minimizing the II, subject to the target f_MAX.

Naively generating hardware for the code in Pipelining a Datapath with Loop Iteration results in two loads: one from memory b and one from memory c. Because the compiler knows that the access to c[i-1] was written to in the previous iteration, the load from c[i-1] can be optimized away.

Figure 17. Pipelining a Datapath with Loop Iteration

The dependency on the value stored to c in the previous iteration is resolved in a single clock cycle, so an II of 1 is achieved for the loop even though the iterations are not independent.

For additional information about pipelining loops, refer to Pipeline Loops.

When the Intel® HLS Compiler cannot initially achieve II of 1, it chooses from several optimization strategies:

Interleaving

Speculative Execution

These optimizations are applied automatically by the Intel® HLS Compiler, and additionally can be controlled through pragma statements in the design.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® High Level Synthesis Compiler Pro Edition: Best Practices Guide

3.3.3.1.2.1. Pipelining Loops Within A Component