Skip To Main Content
Support Knowledge Base

Why Quantized Model Format Remained FP32 Instead INT8?

Content Type: Product Information & Documentation   |   Article ID: 000095064   |   Last Reviewed: 06/13/2023

Description

  • Quantized ONNX model with FP32 precision format.
  • Ran the compress_model_weights function to reduce the size of the bin file after performing Post-Training Quantization.
  • Compiled the model and noticed that the output of the model is in FP32 instead of INT8.

Resolution

During quantization only required operations in perspective of performance were being quantized. The remaining operations will remain as FP32 in the output. 

Additional information

Related Products

This article applies to 1 products.