Article ID: 000089522 Content Type: Maintenance & Performance Last Reviewed: 11/20/2023

Slower Inferencing Performance on Intel® Neural Compute Stick 2 (Intel® NCS2) Compared to CPU


CPU has more compute power than Intel® NCS2 so it is expected to run faster when inferencing the same model

  • Used Intel® Core™ i7 processor for running with -m model.xml, with random input generated
  • Performance on NCS2 is slower than CPU:

    For NCS2:
    [ INFO ] First inference took 33.88 ms
    [Step 11/11] Dumping statistics report
    Count: 2596 iterations
    Duration: 60141.63 ms
    Latency: 92.60 ms
    Throughput: 5525.09 FPS

    For CPU:
    [ INFO ] First inference took 17.07 ms
    [Step 11/11] Dumping statistics report
    Count: 148124 iterations
    Duration: 60001.79 ms
    Latency: 1.61 ms
    Throughput: 315988.43 FPS


The performance of the CPU is expected to be better compared to Intel® NCS2 since CPU has more computing power.

Intel® NCS2 is an accelerator device that would help in certain situations, especially when additional computing power is required.

Additionally, CPU requires FP32 model format while Intel® NCS2 requires FP16 model format. FP16 might have a Quantization Error since it is squeezed from a full precision model to make it smaller. This would affect accuracy and performance.

Performance means how fast the model is in deployment with two key metrics: latency and throughput.

In OpenVINO™, there are two approaches to enhance performance:

During development: Post-training Optimization tool (POT), Neural Network Compression Framework (NNCF), Model Optimizer.

During deployment: tuning inference parameters and optimizing model execution.

it is possible to combine both approaches.

Related Products

This article applies to 2 products