Visible to Intel only — GUID: jbr1446674793544
Answers to Top FAQs
1. Hyperflex® FPGA Architecture Introduction
2. Hyperflex® Architecture RTL Design Guidelines
3. Compiling Hyperflex® Architecture Designs
4. Design Example Walk-Through
5. Retiming Restrictions and Workarounds
6. Optimization Example
7. Hyperflex® Architecture Porting Guidelines
8. Appendices
9. Hyperflex® Architecture High-Performance Design Handbook Archive
10. Hyperflex® Architecture High-Performance Design Handbook Revision History High-Speed Clock Domains Restructuring Loops Control Signal Backpressure Flow Control with FIFO Status Signals Flow Control with Skid Buffers Read-Modify-Write Memory Counters and Accumulators State Machines Memory DSP Blocks General Logic Modulus and Division Resets Hardware Re-use Algorithmic Requirements FIFOs Ternary Adders
5.2.1. Insufficient Registers
5.2.2. Short Path/Long Path
5.2.3. Fast Forward Limit
5.2.4. Loops
5.2.5. One Critical Chain per Clock Domain
5.2.6. Critical Chains in Related Clock Groups
5.2.7. Complex Critical Chains
5.2.8. Extend to locatable node
5.2.9. Domain Boundary Entry and Domain Boundary Exit
5.2.10. Critical Chains with Dual Clock Memories
5.2.11. Critical Chain Bits and Buses
5.2.12. Delay Lines
Visible to Intel only — GUID: jbr1446674793544
4.1.4. Step 4: Optimize Short Path and Long Path Conditions
After removing asynchronous registers and adding pipeline stages, the Fast Forward Details report suggests that short path and long path conditions limit further optimization. In this example, the longest path limits the fMAX for this specific clock domain. To increase the performance, follow these steps to reduce the length of the longest path for this clock domain.
- To view the long path information, click the Critical Chain Details tab in the Fast Forward Details report. Review the structure of the logic around this path, and consider the associated RTL code. This path involves the node module of the node.v file. The critical path relates to the computation of registers data_hi and data_lo, which are part of several comparators.
The following shows the original RTL for this path:
always @(*) begin : comparator if(data_a < data_b) begin sel0 = 1'b0; // data_a : lo / data_b : hi end else begin sel0 = 1'b1; // data_b : lo / data_a : hi end end always @(*) begin : mux_lo_hi case (sel0) 1'b0 : begin if(LOW_MUX == 1) data_lo = data_a; if(HI_MUX == 1) data_hi = data_b; end 1'b1 : begin if(LOW_MUX == 1) data_lo = data_b; if(HI_MUX == 1) data_hi = data_a; end default : begin data_lo = {DATA_WIDTH{1'b0}}; data_hi = {DATA_WIDTH{1'b0}}; end endcase end
The Compiler infers the following logic from this RTL:
- A comparator that creates the sel0 signal
- A pair of muxes that create the data_hi and data_lo signals, as the following figure shows:
Figure 103. Node Component Connections - Review the pixel_network.v file that instantiates the node module. The node module's outputs are unconnected when you do not use them. These unconnected outputs result in no use of the LOW_MUX or HI_MUX code. Rather than inferring muxes, use bitwise logic operation to compute the values of the data_hi and data_lo signals, as the following example shows:
reg [DATA_WIDTH-1:0] sel0; always @(*) begin : comparator if(data_a < data_b) begin sel0 = {DATA_WIDTH{1'b0}}; // data_a : lo / data_b : hi end else begin sel0 = {DATA_WIDTH{1'b1}}; // data_b : lo / data_a : hi end data_lo = (data_b & sel0) | (data_a & sel0); data_hi = (data_a & sel0) | (data_b & sel0); end
- Once again, compile the design and view the Fast Forward Details report. The performance increase is similar to the estimates, and short path and long path combinations no longer limit further performance. After this step, only a logical loop limits further performance.
Figure 104. Short Path and Long Path Conditions OptimizedNote: As an alternative to completing the preceding steps, you can open and compile the Median_filter_<version>/Final/median.qpf project file that already includes these changes, and then observe the results.