Quartus® Prime Pro Edition User Guide: Design Recommendations
ID
683082
Date
12/11/2024
Public
A newer version of this document is available. Customers should click here to go to the newest version.
1.1. Using Provided HDL Templates
1.2. Instantiating IP Cores in HDL
1.3. Inferring Multipliers and DSP Functions
1.4. Inferring Memory Functions from HDL Code
1.5. Register and Latch Coding Guidelines
1.6. General Coding Guidelines
1.7. Designing with Low-Level Primitives
1.8. Cross-Module Referencing (XMR) in HDL Code
1.9. Using force Statements in HDL Code
1.10. Recommended HDL Coding Styles Revision History
1.4.1.1. Use Synchronous Memory Blocks
1.4.1.2. Avoid Unsupported Reset and Control Conditions
1.4.1.3. Check Read-During-Write Behavior
1.4.1.4. Controlling RAM Inference and Implementation
1.4.1.5. Single-Clock Synchronous RAM with Old Data Read-During-Write Behavior
1.4.1.6. Single-Clock Synchronous RAM with New Data Read-During-Write Behavior
1.4.1.7. Simple Dual-Port, Dual-Clock Synchronous RAM
1.4.1.8. True Dual-Port Synchronous RAM
1.4.1.9. Mixed-Width Dual-Port RAM
1.4.1.10. RAM with Byte-Enable Signals
1.4.1.11. Specifying Initial Memory Contents at Power-Up
1.6.6.1. If Performance is Important, Optimize for Speed
1.6.6.2. Use Separate CRC Blocks Instead of Cascaded Stages
1.6.6.3. Use Separate CRC Blocks Instead of Allowing Blocks to Merge
1.6.6.4. Take Advantage of Latency if Available
1.6.6.5. Save Power by Disabling CRC Blocks When Not in Use
1.6.6.6. Initialize the Device with the Synchronous Load (sload) Signal
3.4.1. Apply Complete System-Centric Timing Constraints for the Timing Analyzer
3.4.2. Force the Identification of Synchronization Registers
3.4.3. Set the Synchronizer Data Toggle Rate
3.4.4. Optimize Metastability During Fitting
3.4.5. Increase the Length of Synchronizers to Protect and Optimize
3.4.6. Increase the Number of Stages Used in Synchronizers
3.4.7. Select a Faster Speed Grade Device
1.6.3.1. Architectures with 6-Input LUTs in Adaptive Logic Modules
In Intel FPGA device families with 6-input LUT in their basic logic structure, ALMs can simultaneously add three bits. Take advantage of this feature by restructuring your code for better performance.
Although code targeting 4-input LUT architectures compiles successfully for 6-input LUT devices, the implementation can be inefficient. For example, to take advantage of the 6-input adaptive ALUT, you must rewrite large pipelined binary adder trees designed for 4-input LUT architectures. By restructuring the tree as a ternary tree, the design becomes much more efficient, significantly improving density utilization.
Verilog HDL Pipelined Ternary Tree
The example shows a pipelined adder, but partitioning your addition operations can help you achieve better results in non-pipelined adders as well. If your design is not pipelined, a ternary tree provides much better performance than a binary tree. For example, depending on your synthesis tool, the HDL code sum = (A + B + C) + (D + E) is more likely to create the optimal implementation of a 3-input adder for A + B + C followed by a 3-input adder for sum1 + D + E than the code without the parentheses. If you do not add the parentheses, the synthesis tool may partition the addition in a way that is not optimal for the architecture.
module ternary_adder_tree (a, b, c, d, e, clk, out); parameter width = 16; input [width-1:0] a, b, c, d, e; input clk; output [width-1:0] out; wire [width-1:0] sum1, sum2; reg [width-1:0] sumreg1, sumreg2; // registers always @ (posedge clk) begin sumreg1 <= sum1; sumreg2 <= sum2; end // 3-bit additions assign sum1 = a + b + c; assign sum2 = sumreg1 + d + e; assign out = sumreg2; endmodule