Intel® Hyperflex™ Architecture High-Performance Design Handbook

ID 683353
Date 12/08/2023
Public
Document Table of Contents

3. Compiling Intel® Hyperflex™ Architecture Designs

The Intel® Quartus® Prime Pro Edition Compiler is optimized take full advantage of the Intel® Hyperflex™ architecture. The Intel® Quartus® Prime Pro Edition Compiler supports the Hyper-Aware design flow, in which the Compiler automatically maximizes retiming of registers into Hyper-Registers.

Hyper-Aware Design Flow

Use the Hyper-Aware design flow to shorten design cycles and optimize performance. The Hyper-Aware design flow combines automated register retiming, with implementation of targeted timing closure recommendations (Fast Forward compilation), to maximize use of Hyper-Registers and drive the highest performance for Intel® Hyperflex™ architecture FPGAs.

Figure 81. Hyper-Aware Design Flow

Register Retiming

A key innovation of the Intel® Hyperflex™ architecture is the addition of multiple Hyper-Registers in routing segments and block inputs. Maximizing the use of Hyper-Registers improves design performance. The prevalence of Hyper-Registers improves balance of time delays between registers and mitigates critical path delays. The Compiler's Retime stage moves registers out of ALMs and retimes them into Hyper-Registers, wherever advantageous. Register retiming runs automatically during the Fitter, requires minimal effort, and can result in significant performance improvement. Following retiming, the Finalize stage corrects connections with hold violations.

Fast Forward Compilation

If you require optimization beyond simple register retiming, run Fast Forward compilation to generate timing closure recommendations that break key performance bottlenecks that prevent further movement into Hyper-Registers. For example, Fast Forward recommends removing specific retiming restrictions that prevent further retiming into Hyper-Registers. Fast Forward compilation shows precisely where to make the most impact with RTL changes, and reports the predictive performance benefits you can expect from removing restrictions and retiming into Hyper-Registers (Hyper-Retiming). The Fitter does not automatically retime registers across RAM and DSP blocks. However, Fast Forward analysis shows the potential performance benefit from this optimization.

Figure 82. Hyper-Register Architecture

Fast-Forward compilation identifies the best location to add pipeline stages (Hyper-Pipelining), and the expected performance benefit in each case. After you modify the RTL to place pipeline stages at the boundaries of each clock domain, the Retime stage automatically places the registers within the clock domain at the optimal locations to maximize performance. Implement the recommendations in RTL to achieve similar results. After implementing any changes, re-run the Retime stage until the results meet performance and timing requirements. Fast Forward compilation does not run automatically as part of a full compilation. Enable or run Fast Forward compilation in the Compilation Dashboard.

Table 7.  Optimization Steps
Optimization Step Technique Description
Step 1 Register Retiming The Retime stage performs register retiming and moves existing registers into Hyper-Registers to increase performance by removing retiming restrictions and eliminating critical paths.
Step 2 Fast Forward Compile Compiler generates design-specific timing closure recommendations and predicts performance improvement with removal of all barriers to Hyper-Registers (Hyper-Retiming).
Step 3 Hyper-Pipelining Use Fast Forward compilation to identify where to add new registers and pipeline stages in RTL.
Step 4 Hyper-Optimization Design optimization beyond Hyper-Retiming and Hyper-Pipelining, such as restructuring loops, removing control logic limits, and reducing the delay along long paths.

Verifying Design RTL

The Intel® Quartus® Prime software includes the Design Assistant design rule checking tool to verify the suitability to your design RTL for the Intel® Hyperflex™ architecture. These rules include Hyper-Retimer Readiness Rules (HRR) that specifically target Intel® Hyperflex™ FPGA architecture designs, as describes.