Developer Guide

  • 2022.3
  • 10/25/2022
  • Public Content
Contents

OpenVINO™ Benchmarking Tool

This tutorial tells you how to run the benchmark application on an 11th Generation Intel® Core™ processor with an integrated GPU. It uses the asynchronous mode to estimate deep learning inference engine performance and latency.

Start Docker* Container

  1. Check if your installation has the eiforamr-full-flavour-sdk Docker* image.
    docker images |grep eiforamr-full-flavour-sdk #if you have it installed, the result is: eiforamr-full-flavour-sdk
    If the image is not installed, continuing with these steps
    triggers a build that takes longer than an hour
    (sometimes, a lot longer depending on the system resources and internet connection).
  2. If the image is not installed, Intel recommends installing the Robot Complete Kit with the Get Started Guide for Robots.
  3. Go to the AMR_containers folder:
    cd <edge_insights_for_amr_path>/Edge_Insights_for_Autonomous_Mobile_Robots_<version>/AMR_containers
  4. Start the Docker* container as root:
    ./run_interactive_docker.sh eiforamr-full-flavour-sdk:<TAG> root

Set Environment Variables

The environment variables must be set before you can compile and run OpenVINO™ applications.
  1. Run the following script:
    source /opt/intel/openvino/bin/setupvars.sh --or-- source <OPENVINO_INSTALL_DIR>/bin/setupvars.sh

Build Benchmark Application

  1. Change directory and build the benchmark application using the
    cmake
    script file using the following commands:
    cd /opt/intel/openvino/inference_engine/samples/cpp ./build_samples.sh
  2. Once the build is successful, access the benchmark application in the following directory:
    cd /root/inference_engine_cpp_samples_build/intel64/Release -- or -- cd <INSTALL_DIR>/inference_engine_cpp_samples_build/intel64/Release
    The
    benchmark_app
    application is available inside the Release folder.

Input File

Select an image file or a sample video file to provide an input to the benchmark application from the following directory:
cd /root/inference_engine_cpp_samples_build/intel64/Release

Application Syntax and Options

The benchmark application syntax is as follows:
./benchmark_app [OPTION]
In this tutorial, we recommend you select the following options:
./benchmark_app -m <model> -i <input> -d <device> -nireq <num_reqs> -nthreads <num_threads> -b <batch> where: <model>-------------The complete path to the model .xml file <input>-------------The path to the folder containing image or sample video file. <device>------------The device type can be GPU or CPU etc., <num_reqs>----------No of parallel inference requests <num_threads>-------No of threads to use for inference on the CPU (throughput mode) <batch>-------------Batch size
For complete details on the available options, run the following command:
./benchmark_app -h

Run the Application

The benchmark application is executed as seen below. This tutorial uses the following settings:
  • Benchmark application is executed on
    frozen_inference_graph
    model.
  • Number of parallel inference requests is set as 8.
  • Number of CPU threads to use for inference is set as 8.
  • Device type is GPU.
./benchmark_app -d GPU -i ~/<dir>/input/ -m /home/eiforamr/workspace/object_detection/src/object_detection/models/ssd_mobilenet_v2_coco/frozen_inference_graph.xml -nireq 8 -nthreads 8 ./benchmark_app -d GPU -i /home/eiforamr/data_samples/media_samples/plates_720.mp4 -m /home/eiforamr/workspace/object_detection/src/object_detection/models/ssd_mobilenet_v2_coco/frozen_inference_graph.xml -nireq 8 -nthreads 8
Expected output:
[Step 1/11] Parsing and validating input arguments [ INFO ] Parsing input parameters [ INFO ] Files were added: 1 [ INFO ] /home/eiforamr/data_samples/media_samples/plates_720.mp4 [Step 2/11] Loading Inference Engine [ INFO ] InferenceEngine: API version ............ 2.1 Build .................. 2021.2.0-1877-176bdf51370-releases/2021/2 Description ....... API [ INFO ] Device info: GPU clDNNPlugin version ......... 2.1 Build ........... 2021.2.0-1877-176bdf51370-releases/2021/2 [Step 3/11] Setting device configuration [ WARNING ] -nstreams default value is determined automatically for GPU device. Although the automatic selection usually provides a reasonable performance,but it still may be non-optimal for some cases, for more information look at README. [Step 4/11] Reading network files [ INFO ] Loading network files [ INFO ] Read network took 89.49 ms [Step 5/11] Resizing network to match image sizes and given batch [ INFO ] Network batch size: 1 [Step 6/11] Configuring input of the model [Step 7/11] Loading the model to the device [ INFO ] Load network took 44714.68 ms [Step 8/11] Setting optimal runtime parameters [Step 9/11] Creating infer requests and filling input blobs with images [ INFO ] Network input 'image_tensor' precision U8, dimensions (NCHW): 1 3 300 300 [ WARNING ] No supported image inputs found! Please check your file extensions: bmp, dib, jpeg, jpg, jpe, jp2, png, pbm, pgm, ppm, sr, ras, tiff, tif [ INFO ] Infer Request 0 filling [ INFO ] Fill input 'image_tensor' with random values (image is expected) [ INFO ] Infer Request 1 filling [ INFO ] Fill input 'image_tensor' with random values (image is expected) [ INFO ] Infer Request 2 filling [ INFO ] Fill input 'image_tensor' with random values (image is expected) [ INFO ] Infer Request 3 filling [ INFO ] Fill input 'image_tensor' with random values (image is expected) [ INFO ] Infer Request 4 filling [ INFO ] Fill input 'image_tensor' with random values (image is expected) [ INFO ] Infer Request 5 filling [ INFO ] Fill input 'image_tensor' with random values (image is expected) [ INFO ] Infer Request 6 filling [ INFO ] Fill input 'image_tensor' with random values (image is expected) [ INFO ] Infer Request 7 filling [ INFO ] Fill input 'image_tensor' with random values (image is expected) [Step 10/11] Measuring performance (Start inference asynchronously, 8 inference requests using 2 streams for GPU, limits: 60000 ms duration) [ INFO ] First inference took 10.01 ms [Step 11/11] Dumping statistics report Count: 9456 iterations Duration: 60066.11 ms Latency: 51.33 ms Throughput: 157.43 FPS

Benchmark Report

Sample execution results using an 11th Gen Intel® Core™ i7-1185GRE @ 2.80 GHz.
Read network time (ms)
89
Load network time (ms)
44714.68
First inference time (ms)
10.01
Total execution time (ms)
60066.11
Total num of iterations
9456
Latency (ms)
51.33
Throughput (FPS)
157.43
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. No product or component can be absolutely secure. Performance varies by use, configuration and other factors. Learn more at Intel® Performance Index.

Troubleshooting

For general robot issues, go to: Troubleshooting for Robot Tutorials.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.