9.4.3.1. Performance Impact of Mixed Precision

FPGA AI Suite Handbook

Download PDF

ID 863373

Date 11/21/2025

Version 2025.3

Public

9.4.3.1. Performance Impact of Mixed Precision

When you enable the mixed precision feature, expect the following effects on area and throughput:

Area impact.
Adding enable_mixed_precision:true to the Architecture Description File results in a small increase in the ALM utilization of the FPGA AI Suite IP.

Throughput impact

The throughput impact of mixed precision depends on your specific graph and the layers you configure to run at __fpga_precision=high and __fpga_precision=high-feature. The per-layer throughput impact is summarized in the following table:

Table 31. Throughput Impact of FPGA Precision Annotations
Annotation	Relative Computation Required
`__fpga_precision=default`	N/A
`__fpga_precision=high`	Requires 4x the multiply-add operations (compared to a default precision layer).
`__fpga_precision=high-feature`	Requires 2 x the multiply-add operations (compared to a default precision layer).
(no annotation used)	N/A

For FPGA AI Suite Version 2025.3, the following usage constraints apply to the mixed precision feature:

Some graph layers cannot be run at high precision OR not yet supported for mixed precision. For a complete list of layer types and constraints, refer to Graph Layer Mapping and Precision for a complete list of layer types and constraints.
Only select arch_precision values that are supported for use with mixed precision. For supported values, refer to Block Floating Point Notation Convention.

The following known issues apply to the mixed precision:

A rare corner case may arise if the feature input contains many severely subnormal fp16 inputs (for example, with only 1 or 2 significant mantissa digits). If this issue occurs, the dla_benchmark program triggers an assertion. Ensure that your input data is scaled appropriately to prevent this issue.

A critical aspect of optimizing your FPGA AI Suite IP is the ability to make precision versus performance trade-offs. This trade-off involves balancing the computational accuracy of your models with the desired inference speed and resource utilization on the FPGA. The FPGA AI Suite uses Block Floating Point (BFP) as its core numeric representation, enabling both high-throughput and efficient use of FPGA DSP resources. For information about Block Floating Point, refer to Block Floating Point (BFP).

FP16 (INT12-BFP) – Highest default precision supported
FP13 (INT9-BFP) – Optimal for Agilex™ 7
FP12 (INT8-BFP) – Optimal for Agilex™ 5
FP11 (INT7-BFP) – Highest performance, lowest precision; may require quantization-aware training (QAT)

Trade-off considerations:

FP11 offers significant performance boosts and lower memory usage but may reduce model accuracy.
FP13 is often the best default for modern vision models and is supported by many pre-compiled design examples.

Recommended defaults by device

Agilex™ 5: Optimal at FP12 in DSP tensor mode
Agilex™ 7: Optimal at FP13

Use the Model Converter and Calibration Tools provided with the suite to evaluate precision impact and adjust the model accordingly.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

FPGA AI Suite Handbook

9.4.3.1. Performance Impact of Mixed Precision