220.127.116.11.2.1. Pipelining Loops Within A Component
Within a component, loops are the primary source of pipeline parallelism.
When the Intel® HLS Compiler pipelines a loop, it attempts to schedule the loop execution such that the next iteration of the loop enters the pipeline before the previous iteration has completed. This pipelining of loop iterations can lead to higher throughput.
The number of clock cycles between iterations of the loop is called the Initiation Interval (II).
For the highest performance, a loop iteration would start every clock cycle, which corresponds to an II of 1.
Data dependencies that are carried from one loop iteration to another can affect the ability to achieve II of 1. These dependencies are called loop-carried dependencies.
The II of a loop must be high enough to accommodate all loop carried dependencies.
The Intel® HLS Compiler automatically identifies these dependencies and tries to build hardware to resolve them while minimizing the II, subject to the target fMAX.
Naively generating hardware for the code in results in two loads: one from memory b and one from memory c. Because the compiler knows that the access to c[i-1] was written to in the previous iteration, the load from c[i-1] can be optimized away.
When the Intel® HLS Compiler cannot initially achieve II of 1, it chooses from several optimization strategies:
These optimizations are applied automatically by the Intel® HLS Compiler, and additionally can be controlled through pragma statements in the design.