Slower Inferencing Performance on Intel® Neural Compute Stick 2 (Intel® NCS2) Compared to CPU

Content Type: Maintenance & Performance | Article ID: 000089522 | Last Reviewed: 11/20/2023

Description Resolution Additional information

Description

Used Intel® Core™ i7 processor for running benchmark_app.py with -m model.xml, with random input generated
Performance on NCS2 is slower than CPU:
For NCS2:
[ INFO ] First inference took 33.88 ms
[Step 11/11] Dumping statistics report
Count: 2596 iterations
Duration: 60141.63 ms
Latency: 92.60 ms
Throughput: 5525.09 FPS

For CPU:
[ INFO ] First inference took 17.07 ms
[Step 11/11] Dumping statistics report
Count: 148124 iterations
Duration: 60001.79 ms
Latency: 1.61 ms
Throughput: 315988.43 FPS

Resolution

The performance of the CPU is expected to be better compared to Intel® NCS2 since CPU has more computing power.

Intel® NCS2 is an accelerator device that would help in certain situations, especially when additional computing power is required.

Additionally, CPU requires FP32 model format while Intel® NCS2 requires FP16 model format. FP16 might have a Quantization Error since it is squeezed from a full precision model to make it smaller. This would affect accuracy and performance.

Performance means how fast the model is in deployment with two key metrics: latency and throughput.

In OpenVINO™, there are two approaches to enhance performance:

During development: Post-training Optimization tool (POT), Neural Network Compression Framework (NNCF), Model Optimizer.

During deployment: tuning inference parameters and optimizing model execution.

it is possible to combine both approaches.

Additional information

OpenVINO™ Performance Benchmark Results

Choose FP16, FP32 or int8 for Deep Learning Models

Performance Optimization Guide

Slower Inferencing Performance on Intel® Neural Compute Stick 2 (Intel® NCS2) Compared to CPU

Description

Resolution

Additional information

Related Products

Discontinued Products

Need more help?