9.4.2. Improving Layer Accuracy by using Mixed Precision
For ML tasks that are sensitive to precision, although using a lower precision saves area, the inference accuracy may be impacted. Mixed precision feature enables running designated layers in the ML graph to run at a higher precision to achieve a better accuracy.
The following diagram illustrates the conversion from floating point (fp16) to high-precision block floating point (in this example, 2 x INT9-BFP).
Since the block floating point block alignment step uses a mantissa width larger than the fp16 mantissa, there is little to no loss of precision.
The only situation in which the high-precision blocked values lose mantissa precision relative to the fp16 inputs occurs when values of very different magnitude (i.e. having very different exponents) are blocked together. In this situation, a large bit shift is required to block align the mantissas, which can cause some low-precision bits of smaller values in the block to be lost. The 7th blocked value in the diagram illustrates this case.
Using the high precision block floating point numerical decomposition, a PE array parameterized to handle INT9-BFP can perform convolutions at INT17-BFP precision on select layers. The table below summarizes what "high precision BFP" entails for different FPGA AI Suite IP arch_precision parameter values.
arch_precision |
Block floating point |
High precision BFP |
|---|---|---|
FP11 |
INT7-BFP |
INT13-BFP(not supported) |
FP12AGX |
INT8-BFP |
INT15-BFP |
FP13AGX |
INT9-BFP |
INT17-BFP |
FP16 |
INT12-BFP |
INT23-BFP(not supported) |
A high-precision convolution layer has 4x the computational cost of a default precision convolution layer. With high precision BFP decomposition, both features and filters are represented as the sum of two terms. The resulting feature-filter product of sums has four terms.
The computational cost can be reduced by using high precision BFP for only the features, leaving the filters at default precision. Such a layer has 2x the computational cost of a default precision layer.