In the Model Optimization documentation, quantization‑aware training (QAT) is mentioned. It states that QAT allows a user to obtain an accurate optimized model that can be converted to OpenVINO™ Intermediate Representation (IR). However, no additional details are provided. Refer to:
Quantization‑Aware Training (QAT), using OpenVINO™‑compatible training frameworks, is supported through Neural Network Compression Framework (NNCF) for:
NNCF is a framework that provides post‑training and training‑time model compression methods (including QAT) and is used to optimize models for OpenVINO inference.
After QAT fine‑tuning is complete, the optimized model can be exported (commonly to ONNX*) and then converted to OpenVINO™ IR for deployment.
| Note | The transition to INT8 precision and the corresponding footprint benefits occur after converting the model to OpenVINO IR. |
Refer to the following articles:
Enhanced low‑precision pipeline to accelerate inference with OpenVINO toolkit