For some combinations of parameters, simulators and RTL coding styles, the latency of this block in simulation deviates from the expected latency by , - one clock. Actual hardware exhibits the expected latency.
This behavior will be seen, for example, if the clock driving the DSP block is a delayed version of the clock generating the input data, thus introducing more simulation delay for the input clock than for the input data.
To work around this problem, you must ensure that delays between the clock that generates input data to the DSP block, and the input clock of the DSP block, are balanced by delays on the input data. Alternatively ensure that the input data arrives at a later absolute time, or a later simulation delta delay time, compared to the input clock of the DSP block.
Note that such things as more assignment statements on the clock path vs. the data path will cause simulation delta delay differences between those paths.
To accomplish this, modify your testbench to:
- Ensure the clock generating inputs to the Native DSP block is exactly the same signal as the clock input to the Native DSP block.
- If #1 is not feasible, delay the input data relative to the clock.
For example, consider the following original RTL code:
Original RTL:
clk_gen: process
begin
clk_orig <= \'0\';
wait for 5 ns;
clk_orig <= \'1\';
wait for 5 ns;
end process;
...
if (rising_edge(clk_orig)) then
ax <= ax 1;
ay <= ay - 1;
end if
mac_test_bad_style: mult_acc
port map (
...
ax => std_logic_vector(ax), -- [in]
ay => std_logic_vector(ay), -- [in]
clk => ("00" & clk_orig), -- [in]
resulta => resulta2, -- [out]
...
);
resulta2 will show one clock less latency than expected. Note that the concatenation of "00 & clk" in the multiplier\'s clk port assignment adds a simulation delta delay from the "clk_orig" which generates the input data.
Possible workarounds include:
Example 1, Recommendation: Use a 3-bit clock throughout
You can generate the multiplier\'s 3-bit clock directly and use the active bit to clock the input data:
clk_gen: process
begin
clk3bit <= \'000\';
wait for 5 ns;
clk3bit <= \'001\';
wait for 5 ns;
end process;
...
if (rising_edge(clk3bit(0))) then
ax <= ax 1;
ay <= ay - 1;
end if
mac_test_bad_style: mult_acc
port map (
...
ax => std_logic_vector(ax), -- [in]
ay => std_logic_vector(ay), -- [in]
clk => (clk_3bit), -- [in]
resulta => resulta2, -- [out]
...
);
Example 2, Alternate Recommendation: add corresponding delay to the input data
The \'clk => ("00" & clk_orig)\' statement causes the \'clk" port to have an additional simulation delta delay from \'clk_orig\' that\'s driving the data. To overcome this, you can use the original clk_gen process and just add simulation delta delays to the data with assignment statements.
clk_gen: process (same as original)
ax_del <= ax;
ay_del<=ay;
mac_test_bad_style: mult_acc
port map (
...
ax => std_logic_vector(ax_del), -- [in]
ay => std_logic_vector(ay_del), -- [in]
clk => ("00" & clk_orig), -- [in]
resulta => resulta2, -- [out]
...
);