Intel® High Level Synthesis Compiler Pro Edition: Best Practices Guide

ID 683152
Date 3/28/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

3.3.2.2. Clustering the Datapath

Dynamically scheduling all operations adds overhead in the form of additional FPGA area needed to implement the required handshaking control logic.

To reduce this overhead, the compiler groups fixed latency operations into clusters. A cluster of fixed latency operations, such as arithmetic operations, needs fewer handshaking interfaces, thereby reducing the area overhead.

Figure 5. Clustered Logic


If A, B, and C from Figure 4 do not contain variable latency operations, the compiler can cluster them together, as illustrated in Figure 5.

Clustering the logic reduces area by removing the need for signals to stall data flow in addition to other handshaking logic within the cluster.

Cluster Types

The Intel® HLS Compiler can create the following types of clusters:

  • Stall-Enable Cluster (SEC): This cluster type passes the handshaking logic to every pipeline stage in the cluster in parallel. If the cluster is stalled by logic from further down in the datapath, all logic in the SEC stalls at the same time.
    Figure 6. Stall-Enable Cluster


  • Stall-Free Cluster (SFC): This cluster type adds a first in, first out (FIFO) buffer to the end of the cluster that can accommodate at least the entire latency of the pipeline in the cluster. This FIFO is often called an exit FIFO because it is attached to the exit of the cluster datapath.

    Because of this FIFO, the pipeline stages in the cluster do not require any handshaking logic. The stages can run freely and drain into the capacity FIFO, even if the cluster is stalled from logic further down in the datapath.

Figure 7. Stall-Free Cluster


Cluster Characteristics

The exit FIFO of the stall free cluster results in some tradeoffs:
  • Area: Because an SEC does not use an exit FIFO, it can save FPGA area compared to an SFC.

    If you have a design with many small, low-latency clusters, you can save a substantial amount of area by asking the compiler to use SECs instead of SFCs with the hls_use_stall_enable_clusterscomponent attribute. For details, refer to hls_use_stall_enable_clusters Component Attribute in the Intel® HLS Compiler Reference Manual .

  • Latency: Logic that uses SFCs might have a larger latency than logic that uses SECs because of the write-read latency of the exit FIFO.
  • fMAX : In an SFC, the oStall signal has less fanout than in an SEC.

    For a cluster with many pipeline stages, you can improve your design fMAX by using an SFC.

  • Handshaking: The exit FIFO in SFCs allow them to take advantage of hyper-optimized handshaking between clusters. For more information, refer to Hyper Optimized Handshaking.

    SECs do not support this capability.

  • Bubble Handling: SECs remove only leading bubbles. A leading bubble is a bubble that arrives before the first piece of valid data arrives in the cluster. SECs do not remove any arriving afterwards.

    SFCs can use the exit FIFO to remove all bubbles from the pipeline if the SFC gets a downstream stall signal.

  • Stall Behavior: When an SEC receives a downstream stall, it stalls any logic upstream of it within one clock cycle.

    When an SFC receives a downstream stall, the exit FIFO allows it to consume additional valid data depending on how deep the exit FIFO is and how many bubbles are in the cluster datapath.