Support Knowledge Base

Why Both Latency and Throughput Are Higher When Inferencing Model with OpenVINO™ Benchmark on GPU Compared to CPU?

Content Type: Product Information & Documentation | Article ID: 000093152 | Last Reviewed: 02/09/2023

Description Resolution

Description

Inferred same model with OpenVINO™ Benchmark on CPU and GPU:
benchmark_app.exe -m model.xml -d CPU
benchmark_app.exe -m model.xml -d CPU
The resulted latency and throughput on GPU are higher than on CPU.
Unable to determine why both latency and throughput of inferencing on GPU are higher than CPU since low latency results in high throughput.

Resolution

Latency measures the inference time required to process a single input if inferencing synchronously.

When running OpenVINO™ Benchmark with default parameters, it is inferencing in asynchronous mode. Therefore, the resulted latency measures the total inference time required to process the number of inference requests.

In addition, when running Benchmark App on CPU with default parameters, 4 inference requests are created whereas 16 inference requests are created if running Benchmark App on GPU with default parameters. Hence, the resulted latency of inferencing on GPU is higher than on CPU.

Specify the same number of inference requests when running Benchmark App on CPU and GPU for a fair comparison:
benchmark_app.exe -m model.xml -d CPU -nireq 4
benchmark_app.exe -m model.xml -d CPU -nireq 4

Related Products

This article applies to 3 products.

Intel® Xeon Phi™ Processor Software OpenVINO™ toolkit Performance Libraries

Need more help?

Contact support