Intel® Quartus® Prime Standard Edition User Guide: Design Recommendations
ID
683323
Date
9/24/2018
Public
2.1. Using Provided HDL Templates
2.2. Instantiating IP Cores in HDL
2.3. Inferring Multipliers and DSP Functions
2.4. Inferring Memory Functions from HDL Code
2.5. Register and Latch Coding Guidelines
2.6. General Coding Guidelines
2.7. Designing with Low-Level Primitives
2.8. Recommended HDL Coding Styles Revision History
2.4.1.1. Use Synchronous Memory Blocks
2.4.1.2. Avoid Unsupported Reset and Control Conditions
2.4.1.3. Check Read-During-Write Behavior
2.4.1.4. Controlling RAM Inference and Implementation
2.4.1.5. Single-Clock Synchronous RAM with Old Data Read-During-Write Behavior
2.4.1.6. Single-Clock Synchronous RAM with New Data Read-During-Write Behavior
2.4.1.7. Simple Dual-Port, Dual-Clock Synchronous RAM
2.4.1.8. True Dual-Port Synchronous RAM
2.4.1.9. Mixed-Width Dual-Port RAM
2.4.1.10. RAM with Byte-Enable Signals
2.4.1.11. Specifying Initial Memory Contents at Power-Up
2.6.6.1. If Performance is Important, Optimize for Speed
2.6.6.2. Use Separate CRC Blocks Instead of Cascaded Stages
2.6.6.3. Use Separate CRC Blocks Instead of Allowing Blocks to Merge
2.6.6.4. Take Advantage of Latency if Available
2.6.6.5. Save Power by Disabling CRC Blocks When Not in Use
2.6.6.6. Initialize the Device with the Synchronous Load (sload) Signal
3.4.1. Apply Complete System-Centric Timing Constraints for the Timing Analyzer
3.4.2. Force the Identification of Synchronization Registers
3.4.3. Set the Synchronizer Data Toggle Rate
3.4.4. Optimize Metastability During Fitting
3.4.5. Increase the Length of Synchronizers to Protect and Optimize
3.4.6. Set Fitter Effort to Standard Fit instead of Auto Fit
3.4.7. Increase the Number of Stages Used in Synchronizers
3.4.8. Select a Faster Speed Grade Device
2.6.3.1. Architectures with 4-Input LUTs in Logic Elements
Architectures such as Stratix devices and the Cyclone series of devices contain 4-input LUTs as the standard combinational structure in the LE.
If your design can tolerate pipelining, the fastest way to add three numbers A, B, and C in devices that use 4-input lookup tables is to addA + B, register the output, and then add the registered output to C. Adding A + B takes one level of logic (one bit is added in one LE), so this runs at full clock speed. This can be extended to as many numbers as desired.
Adding five numbers in devices that use 4-input lookup tables requires four adders and three levels of registers for a total of 64 LEs (for 16-bit numbers).
Verilog HDL Pipelined Binary Tree
module binary_adder_tree (a, b, c, d, e, clk, out); parameter width = 16; input [width-1:0] a, b, c, d, e; input clk; output [width-1:0] out; wire [width-1:0] sum1, sum2, sum3, sum4; reg [width-1:0] sumreg1, sumreg2, sumreg3, sumreg4; // Registers always @ (posedge clk) begin sumreg1 <= sum1; sumreg2 <= sum2; sumreg3 <= sum3; sumreg4 <= sum4; end // 2-bit additions assign sum1 = A + B; assign sum2 = C + D; assign sum3 = sumreg1 + sumreg2; assign sum4 = sumreg3 + E; assign out = sumreg4; endmodule