Intel® Neural Compressor performs model compression to reduce the model size and increase the speed of deep learning inference for deployment on CPUs or GPUs. This open source Python* library automates popular model compression technologies, such as quantization, pruning, and knowledge distillation across multiple deep learning frameworks.
Using this library, you can:
- Converge quickly on quantized models though automatic accuracy-driven tuning strategies.
- Prune the least important parameters for large models.
- Distill knowledge from a larger model to improve the accuracy of a smaller model for deployment.
- Get started with model compression with one-click analysis and code insertion.
Intel Neural Compressor is part of the end-to-end suite of Intel® AI and machine learning development tools and resources.