Variable Precision DSP Blocks User Guide: Agilex™ 3 FPGAs and SoCs
ID
849313
Date
8/06/2025
Public
1. Agilex™ 3 Variable Precision DSP Blocks Overview
2. Agilex™ 3 Variable Precision DSP Blocks Architecture
3. Agilex™ 3 Variable Precision DSP Blocks Operational Modes
4. Agilex™ 3 Variable Precision DSP Blocks Design Considerations
5. Native Fixed Point DSP Agilex FPGA IP References
6. Native Floating Point DSP Agilex FPGA IP References
7. Native AI Optimized DSP Agilex™ FPGA IP References
8. Multiply Adder FPGA IP References
9. ALTMULT_COMPLEX FPGA IP References
10. LPM_MULT FPGA IP References
11. LPM_DIVIDE (Divider) FPGA IP References
12. Document Revision History for the Variable Precision DSP Blocks User Guide: Agilex™ 3 FPGAs and SoCs
2.1.1. Input Register Bank for Fixed-point Arithmetic
2.1.2. Pipeline Registers for Fixed-point Arithmetic
2.1.3. Pre-adder for Fixed-point Arithmetic
2.1.4. Internal Coefficient for Fixed-point Arithmetic
2.1.5. Multipliers for Fixed-point Arithmetic
2.1.6. Adder or Subtractor for Fixed-point Arithmetic
2.1.7. Accumulator, Chainout Adder, and Preload Constant for Fixed-point Arithmetic
2.1.8. Systolic Register for Fixed-point Arithmetic
2.1.9. Double Accumulation Register for Fixed-point Arithmetic
2.1.10. Output Register Bank for Fixed-point Arithmetic
2.2.1. Input Register Bank for Floating-point Arithmetic
2.2.2. Pipeline Registers for Floating-point Arithmetic
2.2.3. Multipliers for Floating-point Arithmetic
2.2.4. Adder or Subtractor for Floating-point Arithmetic
2.2.5. Output Register Bank for Floating-point Arithmetic
2.2.6. Exception Handling for Floating-point Arithmetic
3.2.2.1. FP16 Supported Precision Formats
3.2.2.2. Sum of Two FP16 Multiplication Mode
3.2.2.3. Sum of Two FP16 Multiplication with FP32 Addition Mode
3.2.2.4. Sum of Two FP16 Multiplication with Accumulation Mode
3.2.2.5. FP16 Vector One Mode
3.2.2.6. FP16 Vector Two Mode
3.2.2.7. FP16 Vector Three Mode
5.1. Native Fixed Point DSP Agilex™ FPGA IP Release Information
5.2. Supported Operational Modes
5.3. Maximum Input Data Width for Fixed-point Arithmetic
5.4. Maximum Output Data Width for Fixed-point Arithmetic
5.5. Parameterizing Native Fixed Point DSP IP
5.6. Native Fixed Point DSP Agilex™ FPGA IP Signals
5.7. IP Migration
6.4.1. FP32 Multiplication Mode Signals
6.4.2. FP32 Addition or Subtraction Mode Signals
6.4.3. FP32 Multiplication with Addition or Subtraction Mode Signals
6.4.4. FP32 Multiplication with Accumulation Mode Signals
6.4.5. FP32 Vector One and Vector Two Modes Signals
6.4.6. Sum of Two FP16 Multiplication Mode Signals
6.4.7. Sum of Two FP16 Multiplication with FP32 Addition Mode Signals
6.4.8. Sum of Two FP16 Multiplication with Accumulation Mode Signals
6.4.9. FP16 Vector One and Vector Two Modes Signals
6.4.10. FP16 Vector Three Mode Signals
3.3.2. Side Input Feed Preloading Method
The side input feed preloading method preloads the ten 8-bit weight data and 8-bit shared exponent data into the ping-pong buffers using side_in_1[7:0] and side_in_2[7:0] buses. The preloading process takes 12 cycles to complete preloading for one set of ping-pong buffers. The weight and shared exponent data are preloaded independently even if the DSP blocks are cascaded. This enables the tensor computation to continue using one set of buffers whilst the other set is being pre-loaded. The following figure shows the dataflow for side input feed.
Figure 52. Dataflow for Side Input Feed MethodThe feed paths are highlighted in red in this figure.
Figure 53. Side Input Feed Method Timing Diagrams
- The update process starts in cycle 1 by setting load_bb_one or load_bb_two to 1’b1. During this first cycle, computation may continue using the buffer set determined by load_buf_sel.
- New shared exponents and weights are loaded during cycles 2 through to cycle 13.
- Data is loaded in accordance with the pattern in the previous diagram via side_in_2 and side_in_1.
- Note that cycles 2 and 3 are still required when the shared exponent is not used as in tensor fixed-point mode.
- During side feed loading, computation can continue using the other set of buffers as determined by the load_buf_sel signal.
- load_bb_one or load_bb_two should become inactive in cycle 13, one cycle before the last of the data is loaded via side_in_2 and side_in_1.
- load_bb_one or load_bb_two becomes inactive one cycle before the last data is fed in through side_in_1 and side_in_2.
- In cycle 14, the newly loaded buffer content is ready for computation and the load_buf_sel signal can be switched to the new buffer set for computation to continue with the previous set.