Intel® Hyperflex™ Architecture High-Performance Design Handbook

ID 683353
Date 10/04/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

4.1.2. Step 2: Add Pipeline Stages and Remove Asynchronous Resets

This first optimization step adds five levels of pipeline registers in the design locations that Fast Forward suggests, and removes the asynchronous resets present in a design module. Adding additional pipeline stages at the interconnect between the ALMs eliminates some of the long routing delays. This optimization step increases fMAX performance to the level that Fast Forward estimates.

To add pipeline stages and remove asynchronous resets from the design:

  1. Open the Median_filter_<version>/Step_1/rtl/hyper_pipe.sv. This file defines a parameterizable hyper_pipe pipeline component that you can easily use in any design. The following shows this component's code with parameterizable width (WIDTH) and depth (NUM_PIPES):
    module hyper_pipe #(
    	parameter WIDTH = 1,
    	parameter NUM_PIPES = 1)
    (
    input clk,
    input [WIDTH-1:0] din,
    output [WIDTH-1:0] dout);
    
    reg [WIDTH-1:0] hp [NUM_PIPES-1:0];
    
    genvar i;
    generate
      if (NUM_PIPES == 0) begin
        assign dout = din;
      end
      else begin
        always @ (posedge clk) 
          hp[0] <= din;
          for (i=1;i < NUM_PIPES;i++) begin : hregs
            always @ ( posedge clk) begin
              hp[i] <= hp[i-1];
            end
          end
        assign dout = hp[NUM_PIPES-1];
      end
    endgenerate
    endmodule
  2. Use the parameterizable module to add some levels of pipeline stages to the locations that Fast Forward recommends. The following example shows how to add latency before the q output of the dff_3_pipe module:
    . . .
    
    hyper_pipe #( 
    	.WIDTH (DATA_WIDTH),
    	.NUM_PIPES(4)
    ) hp_d0 (
    	.clk(clk),
    	.din(d0),
    	.dout(q0_int)
    );
    . . .
    always @(posedge clk)
    begin : register_bank_3u
        if(~rst_n) begin
            q0 <= {DATA_WIDTH{1'b0}};
            q1 <= {DATA_WIDTH{1'b0}};
            q2 <= {DATA_WIDTH{1'b0}};
        end else begin
    		  q0 <= q0_int;
    		  q1 <= q1_int;
    		  q2 <= q2_int;
        end
    end
  3. Remove the asynchronous resets inside the dff_3_pipe module by simply changing the registers to synchronous registers, as shown below. Refer to Reset Strategies for general examples of efficient reset implementations.
    always @(posedge clk or negedge rst_n) // Asynchronous reset
    begin : register_bank_3u
        if(~rst_n) begin
            q0 <= {DATA_WIDTH{1'b0}};
            q1 <= {DATA_WIDTH{1'b0}};
            q2 <= {DATA_WIDTH{1'b0}};
        end else begin
            q0_reg <= d0;
            q1_reg <= d1;
            q2_reg <= d2;
    		  q0 <= q0_reg;
    		  q1 <= q1_reg;
    		  q2 <= q2_reg;
        end
    end
    
    always @(posedge clk)
    begin : register_bank_3u
        if(~rst_n_int) begin  // Synchronous reset
            q0 <= {DATA_WIDTH{1'b0}};
            q1 <= {DATA_WIDTH{1'b0}};
            q2 <= {DATA_WIDTH{1'b0}};
        end else begin
    		  q0 <= q0_int;
    		  q1 <= q1_int;
    		  q2 <= q2_int;
        end
    end
    These RTL changes add five levels of pipeline to the inputs of the median_wrapper design (word0, word1, and word2 buses), and five levels of pipeline into the dff_3_pipe module. The following steps show the results of these changes.
  4. To implement the changes, save all design changes and click Compile Design on the Compilation Dashboard.
  5. Following compilation, once again view the compilation results for the Clk clock domain in the Fast Forward Details report.

    The report shows the effect of the RTL changes on the Base Performance fMAX of the design. The design performance now increases to 495 MHz.

    The report indicates that you can achieve further performance improvement by removing more asynchronous registers, adding more pipeline registers, and addressing optimization limits of short path and long path. The following steps describe implementation of these recommendations in the design RTL.

    Note: As an alternative to completing the preceding steps, you can open and compile the Median_filter_<version>/Step_1/median.qpf project file that already includes these changes, and then observe the results.