A newer version of this document is available. Customers should click here to go to the newest version.
9.4.2. Improving Layer Accuracy by using Mixed Precision
For ML tasks that are sensitive to precision, although using a lower precision saves area, the inference accuracy may be impacted. Mixed precision feature enables running designated layers in the ML graph to run at a higher precision to achieve a better accuracy.
The following diagram illustrates the conversion from floating point (fp16) to high-precision block floating point (in this example, 2 x INT9-BFP).
Since the block floating point block alignment step uses a mantissa width larger than the fp16 mantissa, there is little to no loss of precision.
The only situation in which the high-precision blocked values lose mantissa precision relative to the fp16 inputs occurs when values of very different magnitude (i.e. having very different exponents) are blocked together. In this situation, a large bit shift is required to block align the mantissas, which can cause some low-precision bits of smaller values in the block to be lost. The 7th blocked value in the diagram illustrates this case.
Using the high precision block floating point numerical decomposition, a PE array parameterized to handle INT9-BFP can perform convolutions at INT17-BFP precision on select layers. The table below summarizes what "high precision BFP" entails for different FPGA AI Suite IP arch_precision parameter values.
arch_precision |
Block floating point |
High precision BFP |
|---|---|---|
FP11 |
INT7-BFP |
INT13-BFP(not supported) |
FP12AGX |
INT8-BFP |
INT15-BFP |
FP13AGX |
INT9-BFP |
INT17-BFP |
FP16 |
INT12-BFP |
INT23-BFP(not supported) |
A high-precision convolution layer has 4x the computational cost of a default precision convolution layer. With high precision BFP decomposition, both features and filters are represented as the sum of two terms. The resulting feature-filter product of sums has four terms.
The computational cost can be reduced by using high precision BFP for only the features, leaving the filters at default precision. Such a layer has 2x the computational cost of a default precision layer.