Why Does Applying Different Weights to a Model Affect the Inference...

Summary

Trade-off of using different data and weight formats

Description

Generate two IR files (identical .xml file but different .bin files)
A similar model with different weights run at different fps ( 27fps and 6fps)
Does weights that are more diverse affect inference performance on Myriad X?

Resolution

Model weights and precision (FP32, FP16, INT8) affect the inference performance.

Using FP32 format would result in the full distribution of weight and is known as a Single Precision Floating Point.

Meanwhile, FP16 and INT8 formats are both compressed weight formats where they are squeezed to be smaller in size. The trade-off for these compressions is the accuracy of the model or also known as Quantization Error.

The more bits allocated to represent data, the wider range they could represent and potentially, the better accuracy of the model. However, bigger data requires larger memory space for its storage, higher memory bandwidth needed to transfer it around, and more compute resources and time being used up.

The Intel® Distribution of OpenVINO™ toolkit Benchmark Results depicts obvious differences in terms of performance between different weight formats or precision.

Additional information

INT8 vs FP32 Comparison on Select Networks and Platforms

Choose FP16, FP32 or INT8 for Deep Learning Model.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Why Does Applying Different Weights to a Model Affect the Inference Performance?

Need more help?

Disclaimer