Article ID: 000088030 Content Type: Troubleshooting Last Reviewed: 05/15/2023

Why Does Applying Different Weights to a Model Affect the Inference Performance?


Trade-off of using different data and weight formats

  1. Generate two IR files (identical .xml file but different .bin files)
  2. A similar model with different weights run at different fps ( 27fps and 6fps)
  3. Does weights that are more diverse affect inference performance on Myriad X?

Model weights and precision (FP32, FP16, INT8) affect the inference performance.


Using FP32 format would result in the full distribution of weight and is known as a Single Precision Floating Point.

Meanwhile, FP16 and INT8 formats are both compressed weight formats where they are squeezed to be smaller in size. The trade-off for these compressions is the accuracy of the model or also known as Quantization Error.


The more bits allocated to represent data, the wider range they could represent and potentially, the better accuracy of the model. However, bigger data requires larger memory space for its storage, higher memory bandwidth needed to transfer it around, and more compute resources and time being used up.


The Intel® Distribution of OpenVINO™ toolkit Benchmark Results depicts obvious differences in terms of performance between different weight formats or precision.

Related Products

This article applies to 2 products