Intel® Neural Compressor
Speed Up Inference Deployment without Sacrificing Accuracy
Deploy Low-Precision Inference Solutions on Popular Frameworks
Deep neural networks (DNNs) show state-of-the-art accuracy in a wide range of computation tasks. However, they still face challenges during application deployment due to their high computational complexity of inference. Low precision is one of the key techniques that help conquer the problem.
Intel® Neural Compressor is an open-source Python* library designed to help you quickly deploy low-precision inference solutions on popular deep-learning frameworks such as TensorFlow*, PyTorch*, MXNet*, and ONNX* (Open Neural Network Exchange) runtime. The tool automatically optimizes low-precision recipes for deep-learning models to achieve optimal product objectives, such as inference performance and memory usage, with expected accuracy criteria.
Download as Part of the Toolkit
Intel Neural Compressor is available in the Intel® oneAPI AI Analytics Toolkit, which provides accelerated machine learning and data analytics pipelines with optimized deep learning frameworks and high-performing Python libraries.
Download the Stand-Alone Version
A stand-alone download of Intel Neural Compressor is available. You can download binaries from Intel or choose your preferred repository.
Develop in the Free Intel® Cloud
Get what you need to build and optimize your oneAPI projects for free. With an Intel® DevCloud account, you get 120 days of access to the latest Intel® hardware—CPUs, GPUs, FPGAs—and Intel® oneAPI tools and frameworks. No software downloads. No configuration steps. No installations.
Features
Supports Automatic Accuracy-Driven Tuning Strategies
Implements unified low-precision inference APIs to auto-tune, generate, and deploy a low-precision inference model with a pretrained FP32 model to achieve product performance and accuracy goals.
Optimizes for Performance, Model Size, and Memory Footprint
Supports FP32, BF16, Int8, and mixed precisions on Intel® platforms during tuning and optimizes for both post-training quantization and quantization aware training.
Provides Easy Extension Capability
Delivers an extensible API design to add new tuning strategies, framework backends, metrics, and objectives.
Documentation & Code Samples
Specifications
Processor:
- Intel® Xeon® processor
Operating systems:
- Linux*
- Windows*
Languages:
- Python
Get Help
Your success is our success. Access this support resource when you need assistance.
Stay in the Know with All Things CODE
Sign up to receive the latest trends, tutorials, tools, training, and more to
help you write better code optimized for CPUs, GPUs, FPGAs, and other
accelerators—stand-alone or in any combination.