Hyperflex® Architecture High-Performance Design Handbook
ID
683353
Date
7/07/2025
Public
Answers to Top FAQs
1. Hyperflex® FPGA Architecture Introduction
2. Hyperflex® Architecture RTL Design Guidelines
3. Compiling Hyperflex® Architecture Designs
4. Design Example Walk-Through
5. Retiming Restrictions and Workarounds
6. Optimization Example
7. Hyperflex® Architecture Porting Guidelines
8. Appendices
9. Hyperflex® Architecture High-Performance Design Handbook Archive
10. Hyperflex® Architecture High-Performance Design Handbook Revision History
2.4.2.1. High-Speed Clock Domains
2.4.2.2. Restructuring Loops
2.4.2.3. Control Signal Backpressure
2.4.2.4. Flow Control with FIFO Status Signals
2.4.2.5. Flow Control with Skid Buffers
2.4.2.6. Read-Modify-Write Memory
2.4.2.7. Counters and Accumulators
2.4.2.8. State Machines
2.4.2.9. Memory
2.4.2.10. DSP Blocks
2.4.2.11. General Logic
2.4.2.12. Modulus and Division
2.4.2.13. Resets
2.4.2.14. Hardware Re-use
2.4.2.15. Algorithmic Requirements
2.4.2.16. FIFOs
2.4.2.17. Ternary Adders
5.2.1. Insufficient Registers
5.2.2. Short Path/Long Path
5.2.3. Fast Forward Limit
5.2.4. Loops
5.2.5. One Critical Chain per Clock Domain
5.2.6. Critical Chains in Related Clock Groups
5.2.7. Complex Critical Chains
5.2.8. Extend to locatable node
5.2.9. Domain Boundary Entry and Domain Boundary Exit
5.2.10. Critical Chains with Dual Clock Memories
5.2.11. Critical Chain Bits and Buses
5.2.12. Delay Lines
2.4.1.4.2. Loop Pipelining Demonstration
The following demonstrates proper loop pipelining to optimize an accumulator in an example design. In the original implementation, the accumulator data input in multiplies by x, adds to the previous value out, multiplied by y. This demonstration improves performance using these techniques:
- Implement separation of forward logic
- Retime the loop register
- Create the feedback loop equivalence with cascade logic
Figure 58. Original Loop Structure
Original Loop Structure Example Verilog HDL Code
module orig_loop_strct (rstn, clk, in, x, y, out); input clk, rstn, in, x, y; output out; reg out; reg in_reg; always @ ( posedge clk ) if ( !rstn ) begin in_reg <= 1'b0; end else begin in_reg <= in; end always @ ( posedge clk ) if ( !rstn ) begin out <= 1'b0; end else begin out <= y*out + x*in_reg; end endmodule //orig_loop_strct
The first stage of optimization is rewriting logic to remove as much logic as possible from the loop, and create a forward logic block. The goal of rewriting is to remove as much work as possible from the feedback loop. The Compiler cannot automatically optimize any logic in a feedback loop. Consider the following recommendations in removing logic from the loop:
- Evaluate as many decisions and perform as many calculations in advance of the loop, that do not directly rely on the loop value.
- Potentially pass logic into the register stage before passing into the loop.
After rewriting the logic, the Compiler can now freely retime the logic that you move to the forward path.
Figure 59. Separation of Forward Logic from the Loop
In the next optimization stage, retime the loop register to ensure that the design functions the same as the original loop circuitry.
Figure 60. Retime Loop Register
Finally, further optimize the loop by repeating the first optimization steps with the logic in the highlighted boundary.
Figure 61. Results of Cascade Loop Logic, Hyper-Retimer, and Synthesis Optimizations (Four Level Optimization)
Four Level Optimization Example Verilog HDL Code
module cll_hypr_rtm_synopt ( rstn, clk, x, y, in, out); input rstn, clk, x, y, in; output out; reg out; reg in_reg; wire out_add1; wire out_add2; wire out_add3; wire out_add4; reg out_add1_reg1; reg out_add1_reg2; reg out_add1_reg3; reg out_add1_reg4; always @ ( posedge clk ) if ( !rstn ) begin in_reg <= 0; end else begin in_reg <= in; end always @ ( posedge clk ) if ( !rstn ) begin out_add1_reg1 <= 0; out_add1_reg2 <= 0; out_add1_reg3 <= 0; out_add1_reg4 <= 0; end else begin out_add1_reg1 <= out_add1; out_add1_reg2 <= out_add1_reg1; out_add1_reg3 <= out_add1_reg2; out_add1_reg4 <= out_add1_reg3; end assign out_add1 = x*in_reg + ((((y*out_add1_reg4)*y)*y)*y); assign out_add2 = out_add1 + (y*out_add1_reg1); assign out_add3 = out_add2 + ((y*out_add1_reg2)*y); assign out_add4 = out_add3 + (((y*out_add1_reg3)*y)*y); always @ ( posedge clk ) begin if ( !rstn ) out <= 0; else out <= out_add4; end endmodule //cll_hypr_rtm_synopt