Intel® Neural Compressor Cheat Sheet

author-image

By

Get started with Intel® Neural Compressor using the following commands.

This library performs model compression techniques to increase inference speed and reduce model size across multiple deep learning frameworks.

For more information, see Intel Neural Compressor.

Basic Installation pip install neural-compressor
Full Version (with GUI) Installation pip install neural-compressor-full
Basic Installation Using Anaconda* conda install -c https://software.repos.intel.com/python/conda/ neural-compressor
Import Intel Neural Compressor (in the Code) import neural_compressor as inc
Quantization (in the Code)

from neural_compressor.experimental import Quantization, common

quantizer = Quantization()

quantizer.model = <model>

dataset = quantizer.dataset('dummy', shape=(1, 224, 224, 3))

quantizer.calib_dataloader = common.DataLoader(dataset)

quantizer.fit()

Benchmark (in the Code)

from neural_compressor.experimental import Benchmark, common

evaluator = Benchmark('/path/to/user/pruning/yaml')

evaluator.dataloader = common.DataLoader(dataset, batch_size=batch_size)

# user can also register postprocess and metric, this is optional

evaluator.postprocess = common.Postprocess(postprocess_cls)

evaluator.metric = common.Metric(metric_cls)

results = evaluator() 

Prune (in the Code)

from neural_compressor.experimental import Pruning

prune = Pruning('/path/to/user/pruning/yaml')

prune.model = <model>

model = prune.fit()
Distillation (in the Code)

from neural_compressor.experimental import Distillation

distiller = Distillation('/path/to/user/yaml')

distiller.student_model = student_model

distiller.teacher_model = teacher_model

model = distiller.fit()

Mixed Precision with a bf16 Conversion (in the Code)

from neural_compressor.experimental import MixedPrecision

converter = MixedPrecision()

converter.precisions = 'bf16'

converter.model = model # model is a fp32 model

converter.eval_func = <user_defined_function> # optional, this function only accepts model as input and return a higher-is-better scalar as accuracy

converted_model = converter()

converted_model.save(output_path)

Model Conversion (in-code)

from neural_compressor.experimental import ModelConversion, common

 

conversion = ModelConversion()

conversion.source = 'QAT'

conversion.destination = 'default'

conversion.model = '/path/to/trained/saved_model'

q_model = conversion()

q_model.save('/path/to/quantized/saved_model')

 

For more information and support, see:

Intel Neural Compressor Issues on GitHub*

Intel® AI Analytics Toolkit Forum

Sign up and try Intel Neural Compressor for free using Intel® Developer Cloud for oneAPI.