Intel® Hyperflex™ Architecture High-Performance Design Handbook

ID 683353
Date 12/08/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

2.4.1.1.1. Shannon’s Decomposition Example

The sample circuit adds or subtracts an input value from the internal_total value based on its relationship to a target value. The core of the circuit is the target_loop module, shown in Source Code before Shannon's Decomposition.

Source Code before Shannon's Decomposition

module target_loop (clk, sclr, data, target, running_total);
parameter WIDTH = 32;

input clk;
input sclr;
input [WIDTH-1:0] data; 
input [WIDTH-1:0] target; 
output [WIDTH-1:0] running_total; 

reg [WIDTH-1:0] internal_total; 

always @(posedge clk) begin
		if (sclr)
		begin
 			internal_total <= 0;
		end
		else begin
  		internal_total <= internal_total + ((( internal_total > target) ? -data:data)* (target/4));
		end
end
assign running_total = internal_total;
end module 

The module uses a synchronous clear, based on the recommendations to enable Hyper-Retiming.

Fast Forward Compile Report before Shannon’s Decomposition shows the Fast Forward Compile report for the target_loop module instantiated in a register ring.

Figure 46. Fast Forward Compile Report before Shannon’s Decomposition

Hyper-Retiming reports about 302 MHz by adding a pipeline stage in the Fast Forward Compile. The last Fast Forward Limit row indicates that the critical chain is a loop. Examining the critical chain report reveals that there is a repeated structure in the chain segments. The repeated structure is shown as an example in the Optimizing Loops section.

Elements of a Critical Chain Sub-Loop shows a structure that implements the expression in the previous example code. The functional blocks correspond to the comparison, addition, and multiplication operations. The zero in each arithmetic block’s name is part of the synthesized name in the netlist. The zero is because the blocks are the first zero-indexed instance of those operators created by synthesis.

Figure 47. Elements of a Critical Chain Sub-Loop

This expression is a candidate for Shannon’s decomposition. Instead of performing only one addition with the positive or negative value of data, you can perform the following two calculations simultaneously:

  • internal_total - (data * target/4)
  • internal_total + (data * target/4)

You can then use the result of the comparison internal_total > target to select which calculation result to use. The modified version of the code that uses Shannon’s decomposition to implement the internal_total calculation is shown in Source Code after Shannon's Decomposition.

Source Code after Shannon's Decomposition

module target_loop_shannon (clk, sclr, data, target, running_total);
  parameter WIDTH = 32;

input clk;
input sclr;
input [WIDTH-1:0] data;
input [WIDTH-1:0] target;
output [WIDTH-1:0] running_total;

reg [WIDTH-1:0] internal_total;
wire [WIDTH-1:0] total_minus;
wire [WIDTH-1:0] total_plus;

assign total_minus = internal_total - (data * (target / 4));
assign total_plus = internal_total + (data * (target / 4));

always @(posedge clk) begin
  if (sclr)
  begin 
     internal_total <= 0;
  end
  else begin
     internal_total <= (internal_total > target) ? total_minus:total_plus);
  end
end

assign running_total = internal_total;
endmodule

Fast Forward Summary Report after Shannon's Decomposition shows the performance almost doubles after recompiling the design with the code change.

Figure 48. Fast Forward Summary Report after Shannon's Decomposition