DSP Builder for Intel® FPGAs (Advanced Blockset): Handbook

ID 683337
Date 3/23/2022

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

7.2.3. DSP Builder Loops, Clock Cycles, and Data Cycles

Never confuse clock cycles and data cycles in relation to feedback loops. For example, you may want to accumulate previous data from the same channel. The DSP Builder multichannel IIR filter design example (demo_iir) shows feedback accumulators processing multiple channels. In this example, consecutive data samples on any particular channel are 20 clock cycles apart. DSP Builder derives this number from clock rate and sample rate.

The folded IIR filter design example (demo_iir_fold2) demonstrates one channel, at a low data rate. This design example implements a single-channel infinite impulse response (IIR) filter with a subsystem built from Primitive blocks folded down to a serial implementation.

The design of the IIR is the same as the IIR in the multichannel example, demo_iir. As the channel count is one, the lumped delays in the feedback loops are all one. If you run the design at full speed, there is a scheduling problem. With new data arriving every clock cycle, the lumped delay of one cycle is not enough to allow for pipelining around the loops. However, the data arrives at a much slower rate than the clock rate, in this example 32 times slower (the clock rate in the design is 320 MHz, and the sample rate is 10 MHz), which gives 32 clock cycles between each sample.

You can set the lumped delays to 32 cycles long—the gap between successive data samples—which is inefficient both in terms of register use and in underused multipliers and adders. Instead, use folding to schedule the data through a minimum set of fully used hardware.

Set the SampleRate on both the ChannelIn and ChannelOut blocks to 10 MHz, to inform the synthesis for the Primitive subsystem of the schedule of data through the design. Even though the clock rate is 320 MHz, each data sample per channel is arriving only at 10 MHz. The RTL is folded down—in multiplier use—at the expense of extra logic for signal multiplexing and extra latency.