4.1.4. Step 4: Optimize Short Path and Long Path Conditions

Intel® Hyperflex™ Architecture High-Performance Design Handbook

Download PDF

ID 683353

Date 10/04/2021

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

4.1.4. Step 4: Optimize Short Path and Long Path Conditions

After removing asynchronous registers and adding pipeline stages, the Fast Forward Details report suggests that short path and long path conditions limit further optimization. In this example, the longest path limits the f_MAX for this specific clock domain. To increase the performance, follow these steps to reduce the length of the longest path for this clock domain.

To view the long path information, click the Critical Chain Details tab in the Fast Forward Details report. Review the structure of the logic around this path, and consider the associated RTL code. This path involves the node module of the node.v file. The critical path relates to the computation of registers data_hi and data_lo, which are part of several comparators.

The following shows the original RTL for this path:

always @(*)
  begin : comparator
    if(data_a < data_b) begin
      sel0 = 1'b0; // data_a : lo / data_b : hi
    end else begin
      sel0 = 1'b1; // data_b : lo / data_a : hi
    end
  end

always @(*)
    begin : mux_lo_hi
        case (sel0)
            1'b0 :
            begin
                if(LOW_MUX == 1)
                    data_lo = data_a;
                if(HI_MUX == 1)
                    data_hi = data_b;
            end
            1'b1 :
            begin
                if(LOW_MUX == 1)
                    data_lo = data_b;
                if(HI_MUX == 1)
                    data_hi = data_a;
            end
            default :
            begin
                data_lo = {DATA_WIDTH{1'b0}};
                data_hi = {DATA_WIDTH{1'b0}};
            end
        endcase
    end

The Compiler infers the following logic from this RTL:

A comparator that creates the sel0 signal
A pair of muxes that create the data_hi and data_lo signals, as the following figure shows:

Figure 99. Node Component Connections

Review the pixel_network.v file that instantiates the node module. The node module's outputs are unconnected when you do not use them. These unconnected outputs result in no use of the LOW_MUX or HI_MUX code. Rather than inferring muxes, use bitwise logic operation to compute the values of the data_hi and data_lo signals, as the following example shows:
```
reg [DATA_WIDTH-1:0] sel0;

always @(*)
  begin : comparator
    if(data_a < data_b) begin
      sel0 = {DATA_WIDTH{1'b0}}; // data_a : lo / data_b : hi
    end else begin
      sel0 = {DATA_WIDTH{1'b1}}; // data_b : lo / data_a : hi
 end
	
 data_lo = (data_b & sel0) | (data_a & sel0);
 data_hi = (data_a & sel0) | (data_b & sel0);
end
```
Once again, compile the design and view the Fast Forward Details report. The performance increase is similar to the estimates, and short path and long path combinations no longer limit further performance. After this step, only a logical loop limits further performance.

Figure 100. Short Path and Long Path Conditions Optimized

Note: As an alternative to completing the preceding steps, you can open and compile the Median_filter_<version>/Final/median.qpf project file that already includes these changes, and then observe the results.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® Hyperflex™ Architecture High-Performance Design Handbook

4.1.4. Step 4: Optimize Short Path and Long Path Conditions