6. Variable-Precision DSP in Intel Agilex® 5 FPGAs and SoCs
For INT8 operations in a single DSP block, the Intel Agilex® 5 FPGAs and SoCs improve peak theoretical TOPS:
- D-Series—up to 2.5 times than Intel® Stratix® 10 FPGAs
- E-Series—up to 37 times than Cyclone® V FPGAs
Through a large increase in arithmetic density13 , the Intel Agilex® 5 FPGAs and SoCs fit more multipliers and accumulators in the same footprint of a standard DSP block.
The Intel® FPGA AI Suite (Intel® FPGA AI) supports the new AI features. The Intel® FPGA AI Suite enables push-button flow from industry standard frameworks—such as Caffe, PyTorch* , and TensorFlow* —to FPGA bitstream.
Additionally, the Intel Agilex® 5 FPGAs and SoCs also carry over the variable-precision DSP architecture from previous Intel® FPGAs with hard fixed point and IEEE 754-compliant floating point capabilities.
In fixed point mode, you can configure the DSP blocks to support signal processing with precisions from 9×9 up to 54×54:
- Increased 9×9 multipliers count, with three 9×9 multipliers for every 18×19 multiplier
- A pipeline register increases the maximum DSP block operating frequency and reduces the power consumption
- Dynamically switch multiplier inputs through scanin and chainout signals
- Compile each DSP block independently as six 9×9, dual 18×19, or single 27×27 multiply-accumulate.
The variable-precision DSP supports floating point addition, multiplication, multiply-add, and multiply-accumulate:
- Single-precision 32-bit arithmetic FP32 floating point mode
- Half-precision 16-bit arithmetic FP16 and FP19 floating point modes, and BFLOAT16 floating point format
With a dedicated 64-bit cascade bus, you can cascade multiple variable-precision DSP blocks to efficiently implement even higher-precision DSP functions.
Multiplier | DSP Block Resource Usage | Expected Application |
---|---|---|
9×9 bits | One-sixth of of a variable-precision DSP block (One DSP block can support six 9×9) |
Low-precision fixed point |
18×19 bits | Half of a variable-precision DSP block | Medium-precision fixed point |
27×27 bits | One variable-precision DSP block | High-precision fixed point |
19×36 bits | One variable-precision DSP block with external adder | Fixed point fast Fourier transform (FFT) |
36×36 bits | Two variable-precision DSP blocks with external adder | Very high-precision fixed point |
54×54 bits | Four variable-precision DSP blocks with external adder | Double-precision fixed point |
Half-precision floating point | One variable-precision DSP block (Contains adder for two FP16, FP19, or BFLOAT16 multipliers with one accumulator) |
Half-precision floating point |
Single-precision floating point | One variable-precision DSP block (Contains one FP32 multipliers with one accumulator) |
Single-precision floating point |
AI tensor block | Two sums of ten INT8×INT8 multipliers tensor fixed-point and floating-point computation | Tensor dot products of 10-element vectors computation |
Complex multiplication mode | One variable-precision DSP block (16×16 ± 16×16) |
INT16 complex multiplication |