9.4.3.1. Performance Impact of Mixed Precision
- Area impact.
Adding enable_mixed_precision:true to the Architecture Description File results in a small increase in the ALM utilization of the FPGA AI Suite IP.
- Throughput impact
The throughput impact of mixed precision depends on your specific graph and the layers you configure to run at __fpga_precision=high and __fpga_precision=high-feature. The per-layer throughput impact is summarized in the following table:
Table 31. Throughput Impact of FPGA Precision Annotations Annotation
Relative Computation Required
__fpga_precision=default
N/A
__fpga_precision=high
Requires 4x the multiply-add operations (compared to a default precision layer).
__fpga_precision=high-feature
Requires 2 x the multiply-add operations (compared to a default precision layer).
(no annotation used)
N/A
For FPGA AI Suite Version 2025.3, the following usage constraints apply to the mixed precision feature:
- Some graph layers cannot be run at high precision OR not yet supported for mixed precision. For a complete list of layer types and constraints, refer to Graph Layer Mapping and Precision for a complete list of layer types and constraints.
- Only select arch_precision values that are supported for use with mixed precision. For supported values, refer to Block Floating Point Notation Convention.
The following known issues apply to the mixed precision:
- A rare corner case may arise if the feature input contains many severely subnormal fp16 inputs (for example, with only 1 or 2 significant mantissa digits). If this issue occurs, the dla_benchmark program triggers an assertion. Ensure that your input data is scaled appropriately to prevent this issue.
A critical aspect of optimizing your FPGA AI Suite IP is the ability to make precision versus performance trade-offs. This trade-off involves balancing the computational accuracy of your models with the desired inference speed and resource utilization on the FPGA. The FPGA AI Suite uses Block Floating Point (BFP) as its core numeric representation, enabling both high-throughput and efficient use of FPGA DSP resources. For information about Block Floating Point, refer to Block Floating Point (BFP).
- FP16 (INT12-BFP) – Highest default precision supported
- FP13 (INT9-BFP) – Optimal for Agilex™ 7
- FP12 (INT8-BFP) – Optimal for Agilex™ 5
- FP11 (INT7-BFP) – Highest performance, lowest precision; may require quantization-aware training (QAT)
Trade-off considerations:
- FP11 offers significant performance boosts and lower memory usage but may reduce model accuracy.
- FP13 is often the best default for modern vision models and is supported by many pre-compiled design examples.
Recommended defaults by device
- Agilex™ 5: Optimal at FP12 in DSP tensor mode
- Agilex™ 7: Optimal at FP13
Use the Model Converter and Calibration Tools provided with the suite to evaluate precision impact and adjust the model accordingly.