Intel® High Level Synthesis Compiler Pro Edition: Best Practices Guide

ID 683152
Date 12/13/2021

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents Executing Independent Operations Simultaneously

As described in Mapping Source Code Instructions to Hardware, the compiler can automatically identify independent operations and execute them simultaneously in hardware.

This simultaneous execution of independent operations combined with pipelining is how performance through data parallelism is achieved on an FPGA.

The following image illustrates an example of an adder and a multiplier, which are scheduled to execute simultaneously while operating on separate inputs:

Figure 11. Automatic Vectorization in the Generated Hardware Datapath

This automatic vectorization is analogous to how a superscalar processor takes advantage of instruction-level parallelism, but this vectorization happens statically at compile time instead of dynamically, at runtime.

Because determining instruction-level parallelism occurs at compile time, there is no hardware or runtime cost of dependency checking for the generated hardware datapath. Additionally, the flexible logic and routing of an FPGA means that only the available resources (like ALMs and DSPs) of the FPGA restrict the number of independent operations that can occur simultaneously.

Unrolling Loops

You can unroll loops in the design by using loop attributes. Loop unrolling decreases the number of iterations executed at the expense of increasing hardware resource consumption corresponding to executing multiple iterations of the loop simultaneously.

Once unrolled, the hardware resources are scheduled as described in Scheduling.

The Intel® HLS Compiler never attempts to unroll any loops in your source code automatically. You must always control loop unrolling by using the corresponding pragma. For details, refer to Loop Unrolling (unroll Pragma) in the Intel® High Level Synthesis Compiler Reference Manual .

Conditional Statements

The Intel® HLS Compiler attempts to eliminate conditional or branch statements as much as possible.

Conditionally executed code becomes predicated in the hardware. Predication increases the possibilities for executing operations simultaneously and achieving better performance. Additionally, removing branches allows the compiler to apply other optimizations to the design.

Figure 12. Conditional Statements

In this example, the function foo can be run unconditionally. The code that cannot be run unconditionally, like the memory assignments, retain a condition.