A newer version of this document is available. Customers should click here to go to the newest version.
1.1. Using Provided HDL Templates 1.2. Instantiating IP Cores in HDL 1.3. Inferring Multipliers and DSP Functions 1.4. Inferring Memory Functions from HDL Code 1.5. Register and Latch Coding Guidelines 1.6. General Coding Guidelines 1.7. Designing with Low-Level Primitives 1.8. Recommended HDL Coding Styles Revision History
188.8.131.52. Use Synchronous Memory Blocks 184.108.40.206. Avoid Unsupported Reset and Control Conditions 220.127.116.11. Check Read-During-Write Behavior 18.104.22.168. Controlling RAM Inference and Implementation 22.214.171.124. Single-Clock Synchronous RAM with Old Data Read-During-Write Behavior 126.96.36.199. Single-Clock Synchronous RAM with New Data Read-During-Write Behavior 188.8.131.52. Simple Dual-Port, Dual-Clock Synchronous RAM 184.108.40.206. True Dual-Port Synchronous RAM 220.127.116.11. Mixed-Width Dual-Port RAM 18.104.22.168. RAM with Byte-Enable Signals 22.214.171.124. Specifying Initial Memory Contents at Power-Up
126.96.36.199. If Performance is Important, Optimize for Speed 188.8.131.52. Use Separate CRC Blocks Instead of Cascaded Stages 184.108.40.206. Use Separate CRC Blocks Instead of Allowing Blocks to Merge 220.127.116.11. Take Advantage of Latency if Available 18.104.22.168. Save Power by Disabling CRC Blocks When Not in Use 22.214.171.124. Initialize the Device with the Synchronous Load (sload) Signal
3.4.1. Apply Complete System-Centric Timing Constraints for the Timing Analyzer 3.4.2. Force the Identification of Synchronization Registers 3.4.3. Set the Synchronizer Data Toggle Rate 3.4.4. Optimize Metastability During Fitting 3.4.5. Increase the Length of Synchronizers to Protect and Optimize 3.4.6. Increase the Number of Stages Used in Synchronizers 3.4.7. Select a Faster Speed Grade Device
126.96.36.199. Architectures with 6-Input LUTs in Adaptive Logic Modules
In Intel FPGA device families with 6-input LUT in their basic logic structure, ALMs can simultaneously add three bits. Take advantage of this feature by restructuring your code for better performance.
Although code targeting 4-input LUT architectures compiles successfully for 6-input LUT devices, the implementation can be inefficient. For example, to take advantage of the 6-input adaptive ALUT, you must rewrite large pipelined binary adder trees designed for 4-input LUT architectures. By restructuring the tree as a ternary tree, the design becomes much more efficient, significantly improving density utilization.
Verilog HDL Pipelined Ternary Tree
The example shows a pipelined adder, but partitioning your addition operations can help you achieve better results in non-pipelined adders as well. If your design is not pipelined, a ternary tree provides much better performance than a binary tree. For example, depending on your synthesis tool, the HDL code sum = (A + B + C) + (D + E) is more likely to create the optimal implementation of a 3-input adder for A + B + C followed by a 3-input adder for sum1 + D + E than the code without the parentheses. If you do not add the parentheses, the synthesis tool may partition the addition in a way that is not optimal for the architecture.
module ternary_adder_tree (a, b, c, d, e, clk, out); parameter width = 16; input [width-1:0] a, b, c, d, e; input clk; output [width-1:0] out; wire [width-1:0] sum1, sum2; reg [width-1:0] sumreg1, sumreg2; // registers always @ (posedge clk) begin sumreg1 <= sum1; sumreg2 <= sum2; end // 3-bit additions assign sum1 = a + b + c; assign sum2 = sumreg1 + d + e; assign out = sumreg2; endmodule
Did you find the information on this page useful?