Article ID: 000076343 Content Type: Troubleshooting Last Reviewed: 04/12/2023

Why is my Native Fixed Point DSP block showing an unexpected latency in simulation?

Environment

  • Quartus® II Subscription Edition
  • DSP
  • Simulation
  • BUILT IN - ARTICLE INTRO SECOND COMPONENT
    Description

    For some combinations of parameters, simulators and RTL coding styles, the latency of this block in simulation deviates from the expected latency by , - one clock.  Actual hardware exhibits the expected latency.

    This behavior will be seen, for example, if the clock driving the DSP block is a delayed version of the clock generating the input data, thus introducing more simulation delay for the input clock than for the input data.

    Resolution

    To work around this problem, you must ensure that delays between the clock that generates input data to the DSP block, and the input clock of the DSP block, are balanced by delays on the input data.  Alternatively ensure that the input data arrives at a later absolute time, or a later simulation delta delay time, compared to the input clock of the DSP block.

    Note that such things as more assignment statements on the clock path vs. the data path will cause simulation delta delay differences between those paths.

    To accomplish this, modify your testbench to:

    1. Ensure the clock generating inputs to the Native DSP block is exactly the same signal as the clock input to the Native DSP block.
    2. If #1 is not feasible, delay the input data relative to the clock.

    For example, consider the following original RTL code:

    Original RTL:

    clk_gen: process
        begin
            clk_orig <= \'0\';
            wait for 5 ns;
            clk_orig <= \'1\';
            wait for 5 ns;
        end process;

    ...

    if (rising_edge(clk_orig)) then
            ax <= ax 1;
            ay <= ay - 1;
    end if

    mac_test_bad_style: mult_acc 
        port map (
            ...
            ax         => std_logic_vector(ax),          -- [in]
            ay         => std_logic_vector(ay),          -- [in]
            clk        => ("00" & clk_orig),         -- [in]
            resulta    => resulta2,     -- [out]
            ...
            );

     

    resulta2 will show one clock less latency than expected.  Note that the concatenation of "00 & clk" in the multiplier\'s clk port assignment adds a simulation delta delay from the "clk_orig" which generates the input data.

    Possible workarounds include:

    Example 1, Recommendation:  Use a 3-bit clock throughout

    You can generate the multiplier\'s 3-bit clock directly and use the active bit to clock the input data:

    clk_gen: process
        begin
            clk3bit <= \'000\';
            wait for 5 ns;
            clk3bit <= \'001\';
            wait for 5 ns;
        end process;

    ...

    if (rising_edge(clk3bit(0))) then
         ax <= ax 1;
         ay <= ay - 1;
    end if

    mac_test_bad_style: mult_acc
        port map (
            ...
            ax         => std_logic_vector(ax),          -- [in]
            ay         => std_logic_vector(ay),          -- [in]
            clk        => (clk_3bit),         -- [in]
            resulta    => resulta2,     -- [out]
            ...
            );

     

    Example 2, Alternate Recommendation:  add corresponding delay to the input data

    The \'clk => ("00" & clk_orig)\' statement causes the \'clk" port to have an additional simulation delta delay from \'clk_orig\' that\'s driving the data.  To overcome this, you can use the original clk_gen process and just add simulation delta delays to the data with assignment statements.

    clk_gen: process  (same as original)

    ax_del <= ax;
    ay_del<=ay;

    mac_test_bad_style: mult_acc
        port map (
            ...
            ax         => std_logic_vector(ax_del),          -- [in]
            ay         => std_logic_vector(ay_del),          -- [in]
            clk        => ("00" & clk_orig),         -- [in]
            resulta    => resulta2,     -- [out]
            ...
            );

    Related Products

    This article applies to 1 products

    Intel® Arria® 10 GX FPGA