Latency measures the inference time required to process a single input if inferencing synchronously.
When running OpenVINO™ Benchmark with default parameters, it is inferencing in asynchronous mode. Therefore, the resulted latency measures the total inference time required to process the number of inference requests.
In addition, when running Benchmark App on CPU with default parameters, 4 inference requests are created whereas 16 inference requests are created if running Benchmark App on GPU with default parameters. Hence, the resulted latency of inferencing on GPU is higher than on CPU.
Specify the same number of inference requests when running Benchmark App on CPU and GPU for a fair comparison:
benchmark_app.exe -m model.xml -d CPU -nireq 4
benchmark_app.exe -m model.xml -d CPU -nireq 4