1.1. Using Provided HDL Templates 1.2. Instantiating IP Cores in HDL 1.3. Inferring Multipliers and DSP Functions 1.4. Inferring Memory Functions from HDL Code 1.5. Register and Latch Coding Guidelines 1.6. General Coding Guidelines 1.7. Designing with Low-Level Primitives 1.8. Cross-Module Referencing (XMR) in HDL Code 1.9. Using force Statements in HDL Code 1.10. Recommended HDL Coding Styles Revision History
22.214.171.124. Use Synchronous Memory Blocks 126.96.36.199. Avoid Unsupported Reset and Control Conditions 188.8.131.52. Check Read-During-Write Behavior 184.108.40.206. Controlling RAM Inference and Implementation 220.127.116.11. Single-Clock Synchronous RAM with Old Data Read-During-Write Behavior 18.104.22.168. Single-Clock Synchronous RAM with New Data Read-During-Write Behavior 22.214.171.124. Simple Dual-Port, Dual-Clock Synchronous RAM 126.96.36.199. True Dual-Port Synchronous RAM 188.8.131.52. Mixed-Width Dual-Port RAM 184.108.40.206. RAM with Byte-Enable Signals 220.127.116.11. Specifying Initial Memory Contents at Power-Up
18.104.22.168. If Performance is Important, Optimize for Speed 22.214.171.124. Use Separate CRC Blocks Instead of Cascaded Stages 126.96.36.199. Use Separate CRC Blocks Instead of Allowing Blocks to Merge 188.8.131.52. Take Advantage of Latency if Available 184.108.40.206. Save Power by Disabling CRC Blocks When Not in Use 220.127.116.11. Initialize the Device with the Synchronous Load (sload) Signal
3.4.1. Apply Complete System-Centric Timing Constraints for the Timing Analyzer 3.4.2. Force the Identification of Synchronization Registers 3.4.3. Set the Synchronizer Data Toggle Rate 3.4.4. Optimize Metastability During Fitting 3.4.5. Increase the Length of Synchronizers to Protect and Optimize 3.4.6. Increase the Number of Stages Used in Synchronizers 3.4.7. Select a Faster Speed Grade Device
18.104.22.168. Architectures with 6-Input LUTs in Adaptive Logic Modules
In Intel FPGA device families with 6-input LUT in their basic logic structure, ALMs can simultaneously add three bits. Take advantage of this feature by restructuring your code for better performance.
Although code targeting 4-input LUT architectures compiles successfully for 6-input LUT devices, the implementation can be inefficient. For example, to take advantage of the 6-input adaptive ALUT, you must rewrite large pipelined binary adder trees designed for 4-input LUT architectures. By restructuring the tree as a ternary tree, the design becomes much more efficient, significantly improving density utilization.
Verilog HDL Pipelined Ternary Tree
The example shows a pipelined adder, but partitioning your addition operations can help you achieve better results in non-pipelined adders as well. If your design is not pipelined, a ternary tree provides much better performance than a binary tree. For example, depending on your synthesis tool, the HDL code sum = (A + B + C) + (D + E) is more likely to create the optimal implementation of a 3-input adder for A + B + C followed by a 3-input adder for sum1 + D + E than the code without the parentheses. If you do not add the parentheses, the synthesis tool may partition the addition in a way that is not optimal for the architecture.
module ternary_adder_tree (a, b, c, d, e, clk, out); parameter width = 16; input [width-1:0] a, b, c, d, e; input clk; output [width-1:0] out; wire [width-1:0] sum1, sum2; reg [width-1:0] sumreg1, sumreg2; // registers always @ (posedge clk) begin sumreg1 <= sum1; sumreg2 <= sum2; end // 3-bit additions assign sum1 = a + b + c; assign sum2 = sumreg1 + d + e; assign out = sumreg2; endmodule
Did you find the information on this page useful?