FPGA AI Suite Handbook

ID 863373
Date 11/21/2025
Public
Document Table of Contents

9.4.3. Using the Mixed Precision Feature

To enable mixed precision support in the FPGA AI Suite IP, manually edit the Architecture Description File to add the line:
enable_mixed_precision:true 

Specifying Graph Layers to Run at High Precision

To enable mixed precision, specify the desired precision for a graph layer by manually annotating it onto the OpenVINO™ Model Converter output, the OpenVINO™ graph (graph.xml).

If you rerun the OpenVINO™ Model Converter during development, the graph.xml file is regenerated and you must re-annotate this new file.

Annotating the OpenVINO™ Graph

Refer to the FPGA AI Suite documentation for instructions on running the OpenVINO™ Model Converter tool on your trained network.

The primary OpenVINO™ Model Converter outputs are the OpenVINO™ graph (graph.xml) that describes your network topology, and the weights and biases for your model (graph.bin). These OpenVINO™ outputs are the inputs of the FPGA AI Suite dla_compiler tool.

To annotate the precision of a layer precision onto the OpenVINO™ graph, you must modify that layer name string directly in graph.xml. The precision annotation is appended at the end of the layer name, following a double underscore, as shown in the example below.

<layer id="2" name="model/conv1/Conv2D__fpga_precision=high" type="Convolution" version="opset1">
   <data strides="1, 1" dilations="1, 1" pads_begin="0, 0" pads_end="0, 0" auto_pad="same_upper"/>
   <input>
     <port_id="0" precision="FP32">
...

Do not modify the graph.bin file.

You can specify the following fpga_precision annotation values:
Table 29.  FPGA Precision Annotation Values

Annotation

Description

__fpga_precision=default

Run this layer at the default BFP precision.

The default BFP arch precision is indicated by the arch_precision field in the Architecture Description File.

__fpga_precision=high

Run this layer with both feature and filter at high BFP precision.

__fpga_precision=high-feature

Run this layer with the feature at high BFP precision.

The filter is represented at the default arch_precision.

(no annotation used)

Equivalent to __fpga_precision=default.

Selecting Which Graph Layers to Annotate

The FPGA AI Suite IP runs the most computationally intensive graph layers at block floating point precision. Other graph layers operate at fp16 precision. For details, refer to Block Floating Point (BFP).

The following table summarizes the information that you need to select which graph layers to annotate with precision directives. For each graph layer/primitive, the table tells describes the following information:

  • The corresponding layer type in the graph.xml
  • The FPGA AI Suite IP block to which the dla_compiler tool maps layers of this type.
  • The default precision at which this IP block operates (that is, if __fpga_precision=default).
  • Whether this layer type is a legal target for the fpga_precision annotation.
Table 30.  Graph Layer Mapping and Precision

Layer / Primitive

OpenVINO™ IR (.xml) Layer Type

IP Block Mapping

Default Precision

Supports Mixed Precision?

Convolution(2D or 3D)

Convolution

PE array

arch_precision(bfp)

Yes

Convolutionincluding bias(2D or 3D)

Convolution + Add

PE array

arch_precision(bfp)

Yes

Depthwise

GroupConvolution

+ Add (if bias is present)

Depthwise auxiliary block (if possible)

Otherwise, maps to the PE array

fp16

arch_precision(bfp)

No (if aux block)

Yes (if PE array)

Scale-Shift

Multiply (single input)

PE array

arch_precision(bfp)

Yes

Deconv / Transpose Convolution

ConvolutionBackpropData

PE array

arch_precision(bfp)

No

Elementwise addition of feature + feature tensors

Add

PE array

arch_precision(bfp)

Yes

Elementwise multiplication of feature * filter tensors

Multiply (single input)

PE array

arch_precision(bfp)

Yes

Elementwise multiplication of feature * feature tensors

Multiple(two inputs)

PE array

arch_precision(bfp)

No

Fully connected

Matmul

+ Add (if bias is present)

PE array

arch_precision(bfp)

Yes

ChannelToSpace

DepthToSpace

PixelShuffle

DepthToSpace

PE array

arch_precision(bfp)

No

ReLU

pReLU

Leaky ReLU

ReLU, PReLU

Activation auxiliary block

fp16

N/A

Clamp

Round Clamp

Clamp

Activation auxiliary block

fp16

N/A

H-sigmoid

Hsigmoid

Activation auxiliary block

fp16

N/A

H-swish

HSwish

Activation auxiliary block

fp16

N/A

Sigmoid

Sigmoid

Activation auxiliary block

fp16

N/A

Swish

Swish

Activation auxiliary block

fp16

N/A

Max Pool

MaxPool

Pooling auxiliary block

fp16

N/A

Max Pool (for 3D inputs, i.e. DHWC)

MaxPool

Pooling auxiliary block

+ PE array

fp16

&arch_precision(bfp)

No

Average Pool

AvgPool or ReduceMean

PE array

arch_precision(bfp)

Yes

Softmax

Softmax

Activation auxiliary block

fp16

N/A

Most graph layers that map to the FPGA AI Suite PE array are legal targets for the fpga_precision annotation. These are marked Yes in the Supports Mixed Precision? column of the preceding table.

Layers that are mapped to auxiliary blocks of the FPGA AI Suite IP are computed at fp16 precision. This is "high precision" by default. These layers are marked "N/A" in the above table.

The dla_compiler tool processes your annotations as follows:

  • When a precision annotation is successfully applied, dla_compiler adds this precision to the layer visualization produced by the compiler as "...after_all_passes.dot" generated under $coredla_word/visualizations . This will show up in the layer as Precision_Mode: "Selected Precision" (High, High_Feature).
You can convert this report to SVG format which can be displayed in a web browser by running the following command:
dot -Tsvg ...after_all_passes.dot -o ...after_all_passes.svg
  • For an annotation applied to an unsupported layer type (that is, "No*" or "No" in Graph Layer Mapping and Precision.), dla_compiler issues an error and terminates the compilation.
  • If dla_compiler maps a layer to an auxiliary block (that is, "N/A" in Graph Layer Mapping and Precision), it will ignore any associated precision annotation and will print Precision Mode: Arch, which means the default arch precision.

Achieving Your Numerical Objectives

The mixed precision feature is useful for preserving fine-grained information in the input feature data. Information encoded in the least significant mantissa bits of the feature fp16 input is lost when data is quantized to the default block floating point arch_precision.

Use the __fpga_precision=high or __fpga_precision=high-feature annotations as follows:

  • Apply them to the initial layers of a graph. Applying the annotations at this level preserves fine-grained input data, which would be lost when quantizing to default-precision block floating point.
  • Apply them to consecutive layers of a graph. It is unclear whether there is significant benefit to running high-BFP precision layers after the first default block floating point precision layer.
  • Apply them to the fewest possible graph layers needed to achieve your numerical objectives. Running a layer at high BFP precision is more computationally intensive, which reduces the overall inference throughput.

If your graph contains depthwise convolution layers, leverage the dedicated depthwise auxiliary block:

  • Enable the depthwise auxiliary block by modifying the Architecture Description File. The depthwise auxiliary block provides superior throughput and higher numerical precision for depthwise convolutions.
  • Ensure that the parameterization of the depthwise auxiliary block (for maximum window size, stride, and dilation) are set large enough to support all of the depthwise convolutions in your graph.
  • Use the FPGA AI Suite Area Estimator tool to estimate the FPGA resource utilization required by the depthwise auxiliary block.

For information on using the depthwise auxiliary block, refer to Parameter Group: depthwise.