Support Knowledge Base

Conversion of INT8 Models to Intermediate Representation (IR)

Content Type: Troubleshooting | Article ID: 000058759 | Last Reviewed: 03/05/2026

Description Resolution Additional information

Description

In the Model Optimization documentation, quantization‑aware training (QAT) is mentioned. It states that QAT allows a user to obtain an accurate optimized model that can be converted to OpenVINO™ Intermediate Representation (IR). However, no additional details are provided. Refer to:

Model Optimization Guide
Quantization-aware Training (QAT) in OpenVINO (2026.0)

Resolution

Quantization‑Aware Training (QAT), using OpenVINO™‑compatible training frameworks, is supported through Neural Network Compression Framework (NNCF) for:

TensorFlow* 2 / Keras* models (via NNCF QAT workflow)
PyTorch* models (via NNCF QAT workflow)

NNCF is a framework that provides post‑training and training‑time model compression methods (including QAT) and is used to optimize models for OpenVINO inference.

After QAT fine‑tuning is complete, the optimized model can be exported (commonly to ONNX*) and then converted to OpenVINO™ IR for deployment.

Note

The transition to INT8 precision and the corresponding footprint benefits occur after converting the model to OpenVINO IR.

Additional information

Refer to the following articles:

Enhanced low‑precision pipeline to accelerate inference with OpenVINO toolkit

Neural Network Compression Framework (NNCF)

Related Products

This article applies to 1 products.

OpenVINO™ toolkit

Need more help?

Contact support