Intel® High Level Synthesis Compiler Pro Edition: Best Practices Guide

ID 683152
Date 4/01/2024
Public
Document Table of Contents

3.3.1.1. Mapping Source Code Instructions to Hardware

For fixed architectures, such as CPUs and GPUs, a compiler compiles code into a set of instructions that run on functional units that have a fixed functionality. For these fixed architectures to be useful in a broad range of applications, some of their available functional units are not useful to every program. Unused functional units mean that your program does not fully occupy the fixed architecture hardware.

FPGAs are not subject to these restrictions of fixed functional units. On an FPGA, you can synthesize a specialized hardware datapath that can be fully occupied for an arbitrary set of instructions, which means you can be more efficient with the silicon area of your chip.

By implementing your algorithm in hardware, you can fill your chip with custom hardware that is always (or almost always) working on your problem instead of having idle functional units.

The Intel® HLS Compiler maps statements from the source code to individual specialized hardware operations, as shown in the example in the following image:



In general, each instruction maps to its own unique instance of a hardware operation. However, a single statement can map to more than one hardware operation, or multiple statements can combine into a single hardware operation when the compiler finds that it can generate hardware that is more efficient.

The latency of hardware operations is dependent on the complexity of the operation and the target fMAX.

The compiler takes these hardware operations and connects them into a graph based on their dependencies. When operations are independent, the compiler automatically infers parallelism by executing those operations simultaneously in time.

The following figure shows a dependency graph created for the hardware datapath. The dependency graph shows how the instruction is mapped to hardware operations and how the hardware operations are connected based on their dependencies. The loads in this example instruction are independent of each other and can therefore run simultaneously.