Get started with Intel® Neural Compressor using the following commands.
This library performs model compression techniques to increase inference speed and reduce model size across multiple deep learning frameworks.
For more information, see Intel Neural Compressor.
Basic Installation | pip install neural-compressor |
Full Version (with GUI) Installation | pip install neural-compressor-full |
Basic Installation Using Anaconda* | conda install -c https://software.repos.intel.com/python/conda/ neural-compressor |
Import Intel Neural Compressor (in the Code) | import neural_compressor as inc |
Quantization (in the Code) |
from neural_compressor.experimental import Quantization, common quantizer = Quantization() quantizer.model = <model> dataset = quantizer.dataset('dummy', shape=(1, 224, 224, 3)) quantizer.calib_dataloader = common.DataLoader(dataset) quantizer.fit() |
Benchmark (in the Code) |
from neural_compressor.experimental import Benchmark, common evaluator = Benchmark('/path/to/user/pruning/yaml') evaluator.dataloader = common.DataLoader(dataset, batch_size=batch_size) # user can also register postprocess and metric, this is optional evaluator.postprocess = common.Postprocess(postprocess_cls) evaluator.metric = common.Metric(metric_cls) results = evaluator() |
Prune (in the Code) |
from neural_compressor.experimental import Pruning prune = Pruning('/path/to/user/pruning/yaml') prune.model = <model> model = prune.fit() |
Distillation (in the Code) |
from neural_compressor.experimental import Distillation distiller = Distillation('/path/to/user/yaml') distiller.student_model = student_model distiller.teacher_model = teacher_model model = distiller.fit() |
Mixed Precision with a bf16 Conversion (in the Code) |
from neural_compressor.experimental import MixedPrecision converter = MixedPrecision() converter.precisions = 'bf16' converter.model = model # model is a fp32 model converter.eval_func = <user_defined_function> # optional, this function only accepts model as input and return a higher-is-better scalar as accuracy converted_model = converter() converted_model.save(output_path) |
Model Conversion (in-code) |
from neural_compressor.experimental import ModelConversion, common
conversion = ModelConversion() conversion.source = 'QAT' conversion.destination = 'default' conversion.model = '/path/to/trained/saved_model' q_model = conversion() q_model.save('/path/to/quantized/saved_model') |
For more information and support, see:
Intel Neural Compressor Issues on GitHub*
Intel® AI Analytics Toolkit Forum
Sign up and try Intel Neural Compressor for free using Intel® Developer Cloud for oneAPI.