Article ID: 000095716 Content Type: Maintenance & Performance Last Reviewed: 08/01/2023

No Performance Gain on FP16 Format When Compared to FP32 Format of a Model While Using OpenVINO™ Toolkit

Environment

OpenVINO 2023.0 Ubuntu 20.04 LTS Intel(R) Core(TM) i7-9850H CPU Intel® UHD Graphics 630

BUILT IN - ARTICLE INTRO SECOND COMPONENT
Summary

No improvement seen on FP16 format when compared to FP32 format while using OpenVINO™ toolkit

Description

Expectation is FP16 format to perform faster inference when compared to same model in FP32 format. Using the benchmark_app to run inference with the application's default settings for both formats, but there is no performance improvement (higher FPS) when comparing FP16 format model against FP32 format model.

  • $ omz_downloader --name bert-large-uncased-whole-word-masking-squad-0001
  • $ benchmark_app -m FP32/bert-large-uncased-whole-word-masking-squad-0001.xml -api async -t 5 -hint throughput -d {CPU, GPU}
  • $ benchmark_app -m FP16/bert-large-uncased-whole-word-masking-squad-0001.xml -api async -t 5 -hint throughput -d {CPU, GPU}
Resolution

To execute the FP32 model as F32 format while using the benchmark_app, add -infer_precision f32 for the chosen device.

For example:
$ benchmark_app -m intel/bert-large-uncased-whole-word-masking-squad-0001/FP32/bert-large-uncased-whole-word-masking-squad-0001.xml -d GPU -t 5 -api async -hint throughput -infer_precision f32

Additional information

For GPU plugin, floating-point precision of a GPU primitive is selected based on operation precision in the OpenVINO IR, except for the <compressed f16 OpenVINO IR form, which is executed in the f16 precision.
For CPU plugin, the default floating-point precision of a CPU primitive is f32. To support the f16 OpenVINO™ IR the plugin internally converts all the f16 values to f32 and all the calculations are performed using the native precision of f32. On platforms that natively support bfloat16 calculations (have the AVX512_BF16 or AMX extension), the bf16 type is automatically used instead of f32 to achieve better performance (see the Execution Mode Hint).

For additional information on Data Types for CPU/GPU Plugins refer to:

Related Products

This article applies to 1 products