9.4.3. Using the Mixed Precision Feature

FPGA AI Suite Handbook

Download PDF

ID 863373

Date 11/21/2025

Version 2025.3

Public

9.4.3. Using the Mixed Precision Feature

To enable mixed precision support in the FPGA AI Suite IP, manually edit the Architecture Description File to add the line:

enable_mixed_precision:true

Specifying Graph Layers to Run at High Precision

To enable mixed precision, specify the desired precision for a graph layer by manually annotating it onto the OpenVINO™ Model Converter output, the OpenVINO™ graph (graph.xml).

If you rerun the OpenVINO™ Model Converter during development, the graph.xml file is regenerated and you must re-annotate this new file.

Annotating the OpenVINO™ Graph

Refer to the FPGA AI Suite documentation for instructions on running the OpenVINO™ Model Converter tool on your trained network.

The primary OpenVINO™ Model Converter outputs are the OpenVINO™ graph (graph.xml) that describes your network topology, and the weights and biases for your model (graph.bin). These OpenVINO™ outputs are the inputs of the FPGA AI Suite dla_compiler tool.

To annotate the precision of a layer precision onto the OpenVINO™ graph, you must modify that layer name string directly in graph.xml. The precision annotation is appended at the end of the layer name, following a double underscore, as shown in the example below.

<layer id="2" name="model/conv1/Conv2D__fpga_precision=high" type="Convolution" version="opset1">
   <data strides="1, 1" dilations="1, 1" pads_begin="0, 0" pads_end="0, 0" auto_pad="same_upper"/>
   <input>
     <port_id="0" precision="FP32">
...

Do not modify the graph.bin file.

You can specify the following fpga_precision annotation values:

Table 29. FPGA Precision Annotation Values
Annotation	Description
`__fpga_precision=default`	Run this layer at the default BFP precision. The default BFP arch precision is indicated by the `arch_precision` field in the Architecture Description File.
`__fpga_precision=high`	Run this layer with both feature and filter at high BFP precision.
`__fpga_precision=high-feature`	Run this layer with the feature at high BFP precision. The filter is represented at the default `arch_precision`.
(no annotation used)	Equivalent to `__fpga_precision=default`.

Selecting Which Graph Layers to Annotate

The FPGA AI Suite IP runs the most computationally intensive graph layers at block floating point precision. Other graph layers operate at fp16 precision. For details, refer to Block Floating Point (BFP).

The following table summarizes the information that you need to select which graph layers to annotate with precision directives. For each graph layer/primitive, the table tells describes the following information:

The corresponding layer type in the graph.xml
The FPGA AI Suite IP block to which the dla_compiler tool maps layers of this type.
The default precision at which this IP block operates (that is, if __fpga_precision=default).
Whether this layer type is a legal target for the fpga_precision annotation.

Table 30. Graph Layer Mapping and Precision
Layer / Primitive	OpenVINO™ IR (.xml) Layer Type	IP Block Mapping	Default Precision	Supports Mixed Precision?
Convolution(2D or 3D)	Convolution	PE array	arch_precision(bfp)	Yes
Convolutionincluding bias(2D or 3D)	Convolution + Add	PE array	arch_precision(bfp)	Yes
Depthwise	GroupConvolution + Add (if bias is present)	Depthwise auxiliary block (if possible) Otherwise, maps to the PE array	fp16 arch_precision(bfp)	No (if aux block) Yes (if PE array)
Scale-Shift	Multiply (single input)	PE array	arch_precision(bfp)	Yes
Deconv / Transpose Convolution	ConvolutionBackpropData	PE array	arch_precision(bfp)	No
Elementwise addition of feature + feature tensors	Add	PE array	arch_precision(bfp)	Yes
Elementwise multiplication of feature * filter tensors	Multiply (single input)	PE array	arch_precision(bfp)	Yes
Elementwise multiplication of feature * feature tensors	Multiple(two inputs)	PE array	arch_precision(bfp)	No
Fully connected	Matmul + Add (if bias is present)	PE array	arch_precision(bfp)	Yes
ChannelToSpace DepthToSpace PixelShuffle	DepthToSpace	PE array	arch_precision(bfp)	No
ReLU pReLU Leaky ReLU	ReLU, PReLU	Activation auxiliary block	fp16	N/A
Clamp Round Clamp	Clamp	Activation auxiliary block	fp16	N/A
H-sigmoid	Hsigmoid	Activation auxiliary block	fp16	N/A
H-swish	HSwish	Activation auxiliary block	fp16	N/A
Sigmoid	Sigmoid	Activation auxiliary block	fp16	N/A
Swish	Swish	Activation auxiliary block	fp16	N/A
Max Pool	MaxPool	Pooling auxiliary block	fp16	N/A
Max Pool (for 3D inputs, i.e. DHWC)	MaxPool	Pooling auxiliary block + PE array	fp16 &arch_precision(bfp)	No
Average Pool	AvgPool or ReduceMean	PE array	arch_precision(bfp)	Yes
Softmax	Softmax	Activation auxiliary block	fp16	N/A

Most graph layers that map to the FPGA AI Suite PE array are legal targets for the fpga_precision annotation. These are marked Yes in the Supports Mixed Precision? column of the preceding table.

Layers that are mapped to auxiliary blocks of the FPGA AI Suite IP are computed at fp16 precision. This is "high precision" by default. These layers are marked "N/A" in the above table.

The dla_compiler tool processes your annotations as follows:

When a precision annotation is successfully applied, dla_compiler adds this precision to the layer visualization produced by the compiler as "...after_all_passes.dot" generated under $coredla_word/visualizations . This will show up in the layer as Precision_Mode: "Selected Precision" (High, High_Feature).

You can convert this report to SVG format which can be displayed in a web browser by running the following command:

dot -Tsvg ...after_all_passes.dot -o ...after_all_passes.svg

For an annotation applied to an unsupported layer type (that is, "No*" or "No" in Graph Layer Mapping and Precision.), dla_compiler issues an error and terminates the compilation.
If dla_compiler maps a layer to an auxiliary block (that is, "N/A" in Graph Layer Mapping and Precision), it will ignore any associated precision annotation and will print Precision Mode: Arch, which means the default arch precision.

Achieving Your Numerical Objectives

The mixed precision feature is useful for preserving fine-grained information in the input feature data. Information encoded in the least significant mantissa bits of the feature fp16 input is lost when data is quantized to the default block floating point arch_precision.

Use the __fpga_precision=high or __fpga_precision=high-feature annotations as follows:

Apply them to the initial layers of a graph. Applying the annotations at this level preserves fine-grained input data, which would be lost when quantizing to default-precision block floating point.
Apply them to consecutive layers of a graph. It is unclear whether there is significant benefit to running high-BFP precision layers after the first default block floating point precision layer.
Apply them to the fewest possible graph layers needed to achieve your numerical objectives. Running a layer at high BFP precision is more computationally intensive, which reduces the overall inference throughput.

If your graph contains depthwise convolution layers, leverage the dedicated depthwise auxiliary block:

Enable the depthwise auxiliary block by modifying the Architecture Description File. The depthwise auxiliary block provides superior throughput and higher numerical precision for depthwise convolutions.
Ensure that the parameterization of the depthwise auxiliary block (for maximum window size, stride, and dilation) are set large enough to support all of the depthwise convolutions in your graph.
Use the FPGA AI Suite Area Estimator tool to estimate the FPGA resource utilization required by the depthwise auxiliary block.

For information on using the depthwise auxiliary block, refer to Parameter Group: depthwise.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

FPGA AI Suite Handbook

9.4.3. Using the Mixed Precision Feature

Specifying Graph Layers to Run at High Precision

Annotating the OpenVINO™ Graph

Selecting Which Graph Layers to Annotate

Achieving Your Numerical Objectives