Summary
Operation of quantization in OpenVINO™ toolkit.
Description
- Quantized ONNX model with FP32 precision format.
- Ran the compress_model_weights function to reduce the size of the bin file after performing Post-Training Quantization.
- Compiled the model and noticed that the output of the model is in FP32 instead of INT8.
Resolution
During quantization only required operations in perspective of performance were being quantized. The remaining operations will remain as FP32 in the output.
Additional information
Refer to OpenVINO™ Low Precision Transformation.