Hyperflex® Architecture High-Performance Design Handbook
ID
683353
Date
12/06/2024
Public
A newer version of this document is available. Customers should click here to go to the newest version.
Answers to Top FAQs
1. Hyperflex® FPGA Architecture Introduction
2. Hyperflex® Architecture RTL Design Guidelines
3. Compiling Hyperflex® Architecture Designs
4. Design Example Walk-Through
5. Retiming Restrictions and Workarounds
6. Optimization Example
7. Hyperflex® Architecture Porting Guidelines
8. Appendices
9. Hyperflex® Architecture High-Performance Design Handbook Archive
10. Hyperflex® Architecture High-Performance Design Handbook Revision History
2.4.2.1. High-Speed Clock Domains
2.4.2.2. Restructuring Loops
2.4.2.3. Control Signal Backpressure
2.4.2.4. Flow Control with FIFO Status Signals
2.4.2.5. Flow Control with Skid Buffers
2.4.2.6. Read-Modify-Write Memory
2.4.2.7. Counters and Accumulators
2.4.2.8. State Machines
2.4.2.9. Memory
2.4.2.10. DSP Blocks
2.4.2.11. General Logic
2.4.2.12. Modulus and Division
2.4.2.13. Resets
2.4.2.14. Hardware Re-use
2.4.2.15. Algorithmic Requirements
2.4.2.16. FIFOs
2.4.2.17. Ternary Adders
5.2.1. Insufficient Registers
5.2.2. Short Path/Long Path
5.2.3. Fast Forward Limit
5.2.4. Loops
5.2.5. One Critical Chain per Clock Domain
5.2.6. Critical Chains in Related Clock Groups
5.2.7. Complex Critical Chains
5.2.8. Extend to locatable node
5.2.9. Domain Boundary Entry and Domain Boundary Exit
5.2.10. Critical Chains with Dual Clock Memories
5.2.11. Critical Chain Bits and Buses
5.2.12. Delay Lines
4.1.2. Step 2: Add Pipeline Stages and Remove Asynchronous Resets
This first optimization step adds five levels of pipeline registers in the design locations that Fast Forward suggests, and removes the asynchronous resets present in a design module. Adding additional pipeline stages at the interconnect between the ALMs eliminates some of the long routing delays. This optimization step increases fMAX performance to the level that Fast Forward estimates.
To add pipeline stages and remove asynchronous resets from the design:
- Open the Median_filter_<version>/Step_1/rtl/hyper_pipe.sv. This file defines a parameterizable hyper_pipe pipeline component that you can easily use in any design. The following shows this component's code with parameterizable width (WIDTH) and depth (NUM_PIPES):
module hyper_pipe #( parameter WIDTH = 1, parameter NUM_PIPES = 1) ( input clk, input [WIDTH-1:0] din, output [WIDTH-1:0] dout); reg [WIDTH-1:0] hp [NUM_PIPES-1:0]; genvar i; generate if (NUM_PIPES == 0) begin assign dout = din; end else begin always @ (posedge clk) hp[0] <= din; for (i=1;i < NUM_PIPES;i++) begin : hregs always @ ( posedge clk) begin hp[i] <= hp[i-1]; end end assign dout = hp[NUM_PIPES-1]; end endgenerate endmodule - Use the parameterizable module to add some levels of pipeline stages to the locations that Fast Forward recommends. The following example shows how to add latency before the q output of the dff_3_pipe module:
. . . hyper_pipe #( .WIDTH (DATA_WIDTH), .NUM_PIPES(4) ) hp_d0 ( .clk(clk), .din(d0), .dout(q0_int) ); . . . always @(posedge clk) begin : register_bank_3u if(~rst_n) begin q0 <= {DATA_WIDTH{1'b0}}; q1 <= {DATA_WIDTH{1'b0}}; q2 <= {DATA_WIDTH{1'b0}}; end else begin q0 <= q0_int; q1 <= q1_int; q2 <= q2_int; end end - Remove the asynchronous resets inside the dff_3_pipe module by simply changing the registers to synchronous registers, as shown below. Refer to Reset Strategies for general examples of efficient reset implementations.
always @(posedge clk or negedge rst_n) // Asynchronous reset begin : register_bank_3u if(~rst_n) begin q0 <= {DATA_WIDTH{1'b0}}; q1 <= {DATA_WIDTH{1'b0}}; q2 <= {DATA_WIDTH{1'b0}}; end else begin q0_reg <= d0; q1_reg <= d1; q2_reg <= d2; q0 <= q0_reg; q1 <= q1_reg; q2 <= q2_reg; end end always @(posedge clk) begin : register_bank_3u if(~rst_n_int) begin // Synchronous reset q0 <= {DATA_WIDTH{1'b0}}; q1 <= {DATA_WIDTH{1'b0}}; q2 <= {DATA_WIDTH{1'b0}}; end else begin q0 <= q0_int; q1 <= q1_int; q2 <= q2_int; end endThese RTL changes add five levels of pipeline to the inputs of the median_wrapper design (word0, word1, and word2 buses), and five levels of pipeline into the dff_3_pipe module. The following steps show the results of these changes. - To implement the changes, save all design changes and click Compile Design on the Compilation Dashboard.
- Following compilation, once again view the compilation results for the Clk clock domain in the Fast Forward Details report.
The report shows the effect of the RTL changes on the Base Performance fMAX of the design. The design performance now increases to 495 MHz.
The report indicates that you can achieve further performance improvement by removing more asynchronous registers, adding more pipeline registers, and addressing optimization limits of short path and long path. The following steps describe implementation of these recommendations in the design RTL.
Note: As an alternative to completing the preceding steps, you can open and compile the Median_filter_<version>/Step_1/median.qpf project file that already includes these changes, and then observe the results.