Developer Guide

Contents

Clustering the Datapath

Dynamically scheduling all operations adds overhead in the form of additional FPGA areas required to implement the required handshaking control logic.
To reduce this overhead, the compiler groups fixed latency operations into clusters. A cluster of fixed latency operations, such as arithmetic operations, requires fewer handshaking interfaces, thereby reducing the area overhead.
Clustered Logic
Clustered Logic
If A, B, and C from Figure 1 do not contain variable latency operations, the compiler can cluster them together, as illustrated in Figure 1. Clustering the logic reduces area by removing the need for signals to stall data flow in addition to other handshaking logic within the cluster.

Cluster Types

The
Intel® oneAPI
DPC++/C++
Compiler
can create the following types of clusters:
  • Stall-Enable Cluster (SEC)
    : This cluster type passes the handshaking logic to every pipeline stage in the cluster in parallel. This means that if the cluster is stalled by logic from further down in the datapath, all logic in the SEC stalls simultaneously.
    Stall-Enable Cluster
  • Stall-Free Cluster (SFC)
    : This cluster type adds a first in, first out (FIFO) buffer to the end of the cluster that can accommodate the entire latency of the pipeline in the cluster. This FIFO is often called an
    exit FIFO
    because it is attached to the exit of the cluster datapath.
    Because of this FIFO, the pipeline stages in the cluster do not require any handshaking logic. The stages can run freely and drain into the capacity FIFO, even if the cluster is stalled from logic further down in the datapath.
    Stall-Free Cluster
    Stall-Free Cluster

Cluster Characteristics

The exit FIFO of the stall-free cluster results in some of the following tradeoffs:
  • Area
    : Because an SEC does not use an exit FIFO, it can save FPGA area compared to an SFC. If you have a design with many small, low-latency clusters, you can save a substantial amount of area by asking the compiler to use SECs instead of SFCs.
  • Latency
    : Logic that uses SFCs might have a larger latency than logic that uses SECs because of the write-read latency of the exit FIFO. If you use a zero-latency FIFO for the exit FIFO, you can mitigate the latency, but f
    MAX
    or FPGA area use might be negatively impacted.
  • F
    MAX
    : In an SFC, the
    oStall
    signal has less fanout than in an SEC. For a cluster with many pipeline stages, you can improve your design f
    MAX
    by using an SFC.
  • Handshaking
    : The exit FIFO in SFCs allow them to take advantage of hyper-optimized handshaking between clusters. For more information, refer to Hyper Optimized Handshaking. SECs do not support this capability.
  • Bubble Handling
    : SECs remove only leading bubbles in the pipeline under limited circumstances. A leading bubble is a bubble that arrives before the first piece of valid data arrives in the cluster. SECs do not remove any arriving afterward.
    SFCs can use the capacity FIFO to remove all bubbles from the pipeline if the SFC gets a downstream
    stall
    signal.
  • Stall Behavior
    : When an SEC receives a downstream stall, it stalls any logic upstream of it within one clock cycle. When an SFC receives a downstream stall, the exit FIFO allows it to consume additional valid data depending on how deep the exit FIFO is and how many bubbles are in the cluster datapath.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.