Support Knowledge Base

Why Choose the FP16 Model in Weight Compression Using Optimum Intel / Neural Network Compression Framework (NNCF)?

Content Type: Troubleshooting | Article ID: 000098174 | Last Reviewed: 03/21/2024

Description Resolution

Description

Unable to determine the reason for choosing FP16 model in Weight Compression using Optimum Intel / NNCF.

FP16 half-precision, which halves the model size of FP32 precision, can get an almost identical inference outcome while using half the GPU resources.

This article applies to 3 products.

Intel® Xeon Phi™ Processor Software OpenVINO™ toolkit Performance Libraries

Contact support