Intel® Neural Compressor Cheat Sheet

Kevin Ta Rachel Oberman Preethi Venkatesh

Get started with Intel® Neural Compressor using the following commands.

This library performs model compression techniques to increase inference speed and reduce model size across multiple deep learning frameworks.

For more information, see Intel Neural Compressor.

Basic Installation	pip install neural-compressor
Full Version (with GUI) Installation	pip install neural-compressor-full
Basic Installation Using Anaconda*	conda install -c intel neural-compressor
Import Intel Neural Compressor (in the Code)	import neural_compressor as inc
Quantization (in the Code)	from neural_compressor.experimental import Quantization, common quantizer = Quantization() quantizer.model = <model> dataset = quantizer.dataset('dummy', shape=(1, 224, 224, 3)) quantizer.calib_dataloader = common.DataLoader(dataset) quantizer.fit()
Benchmark (in the Code)	from neural_compressor.experimental import Benchmark, common evaluator = Benchmark('/path/to/user/pruning/yaml') evaluator.dataloader = common.DataLoader(dataset, batch_size=batch_size) # user can also register postprocess and metric, this is optional evaluator.postprocess = common.Postprocess(postprocess_cls) evaluator.metric = common.Metric(metric_cls) results = evaluator()
Prune (in the Code)	from neural_compressor.experimental import Pruning prune = Pruning('/path/to/user/pruning/yaml') prune.model = <model> model = prune.fit()
Distillation (in the Code)	from neural_compressor.experimental import Distillation distiller = Distillation('/path/to/user/yaml') distiller.student_model = student_model distiller.teacher_model = teacher_model model = distiller.fit()
Mixed Precision with a bf16 Conversion (in the Code)	from neural_compressor.experimental import MixedPrecision converter = MixedPrecision() converter.precisions = 'bf16' converter.model = model # model is a fp32 model converter.eval_func = <user_defined_function> # optional, this function only accepts model as input and return a higher-is-better scalar as accuracy converted_model = converter() converted_model.save(output_path)
Model Conversion (in-code)	from neural_compressor.experimental import ModelConversion, common conversion = ModelConversion() conversion.source = 'QAT' conversion.destination = 'default' conversion.model = '/path/to/trained/saved_model' q_model = conversion() q_model.save('/path/to/quantized/saved_model')

For more information and support, see:

Intel Neural Compressor Issues on GitHub*

Intel® AI Analytics Toolkit Forum

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® Neural Compressor Cheat Sheet

Product and Performance Information