Developer Guide

  • 2022.3
  • 10/25/2022
  • Public Content
Contents

VTune™ Profiler in a Docker* Container

Run the profiling application in a Docker* container with the VTune™ profiler.

Run the Sample Application

  1. Check if your installation has the eiforamr-full-flavour-sdk Docker* image.
    docker images |grep eiforamr-full-flavour-sdk #if you have it installed, the result is: eiforamr-full-flavour-sdk
    If the image is not installed, continuing with these steps
    triggers a build that takes longer than an hour
    (sometimes, a lot longer depending on the system resources and internet connection).
  2. If the image is not installed, Intel recommends installing the Robot Complete Kit with the Get Started Guide for Robots.
  3. Go to the
    AMR_containers
    folder:
    cd <edge_insights_for_amr_path>/Edge_Insights_for_Autonomous_Mobile_Robots_<version>/AMR_containers
  4. Prepare the environment setup:
    source ./01_docker_sdk_env/docker_compose/common/docker_compose.source export CONTAINER_BASE_PATH=`pwd` export ROS_DOMAIN_ID=19
  5. Run the VTune™ profiler:
    CHOOSE_USER=root docker-compose -f 01_docker_sdk_env/docker_compose/05_tutorials/vtune.tutorial.yml up oneapi
    Expected output:
    vtune: Warning: To profile kernel modules during the session, make sure they are available in the /lib/modules/kernel_version/ location. vtune: Collection started. To stop the collection, either press CTRL-C or enter from another console window: vtune -r /tmp/matrix_multiply_vtune/r001gh -command stop. Address of buf1 = 0x7f4578e4b010 Offset of buf1 = 0x7f4578e4b180 Address of buf2 = 0x7f457864a010 Offset of buf2 = 0x7f457864a1c0 Address of buf3 = 0x7f45746e2010 Offset of buf3 = 0x7f45746e2100 Address of buf4 = 0x7f4573ee1010 Offset of buf4 = 0x7f4573ee1140 Using multiply kernel: multiply1 Running on Intel(R) Iris(R) Xe Graphics [0x9a49] Elapsed Time: 0.91916s vtune: Collection stopped. vtune: Using result path `/tmp/matrix_multiply_vtune/r001gh' vtune: Executing actions 19 % Resolving information for `libpi_opencl.so' vtune: Warning: Cannot locate debugging information for file `/usr/local/lib/libze_intel_gpu.so.1'. vtune: Executing actions 20 % Resolving information for `libc-dynamic.so' vtune: Warning: Cannot locate debugging information for file `/lib/modules/5.10.65/kernel/fs/overlayfs/overlay.ko'. vtune: Executing actions 20 % Resolving information for `libm-2.31.so' vtune: Warning: Cannot locate debugging information for file `/usr/lib/x86_64-linux-gnu/libm-2.31.so'. vtune: Executing actions 20 % Resolving information for `libc-2.31.so' vtune: Warning: Cannot locate debugging information for file `/usr/lib/x86_64-linux-gnu/libc-2.31.so'. vtune: Executing actions 20 % Resolving information for `ld-2.31.so' vtune: Warning: Cannot locate debugging information for file `/usr/lib/x86_64-linux-gnu/ld-2.31.so'. vtune: Warning: Cannot locate file `vmlinux'. vtune: Executing actions 20 % Resolving information for `libpin3dwarf.so' vtune: Warning: Cannot locate debugging information for file `/usr/local/lib/libigc.so.1.0.8517'. vtune: Executing actions 20 % Resolving information for `libxed.so' vtune: Warning: Cannot locate debugging information for the Linux kernel. Source-level analysis is not possible. Function-level analysis is limited to kernel symbol tables. See the Enabling Linux Kernel Analysis topic in the product online help for instructions. vtune: Executing actions 21 % Resolving information for `libgcc_s.so.1' vtune: Warning: Cannot locate debugging information for file `/usr/lib/x86_64-linux-gnu/libgcc_s.so.1'. vtune: Executing actions 21 % Resolving information for `libstdc++.so.6.0.28' vtune: Warning: Cannot locate debugging information for file `/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28'. vtune: Executing actions 21 % Resolving information for `libtpsstool.so' vtune: Warning: Cannot locate debugging information for file `/opt/intel/oneapi/vtune/2022.0.0/lib64/libtpsstool.so'. vtune: Executing actions 21 % Resolving information for `i915.ko' vtune: Warning: Cannot locate debugging information for file `/opt/intel/oneapi/vtune/2022.0.0/lib64/runtime/libittnotify_collector.so'. vtune: Warning: Cannot locate debugging information for file `/opt/intel/oneapi/vtune/2022.0.0/lib64/runtime/libittnotify_collector.so'. vtune: Executing actions 22 % Resolving information for `libOpenCL.so.1' vtune: Warning: Cannot locate debugging information for file `/usr/local/lib/libze_intel_gpu.so.1.2.20939'. vtune: Executing actions 22 % Resolving information for `libigdrcl.so' vtune: Warning: Cannot locate debugging information for file `/lib/modules/5.10.65/kernel/drivers/gpu/drm/i915/i915.ko'. vtune: Warning: Cannot locate debugging information for file `/usr/local/lib/intel-opencl/libigdrcl.so'. vtune: Warning: Cannot locate debugging information for file `/usr/local/lib/intel-opencl/libigdrcl.so'. vtune: Executing actions 75 % Generating a report Elapsed Time: 1.163s GPU Time: 0.041s EU Array Stalled/Idle: 55.0% of Elapsed time with GPU busy | The percentage of time when the EUs were stalled or idle is high, which has a | negative impact on compute-bound applications. | GPU L3 Bandwidth Bound: 82.0% of peak value | L3 bandwidth was high when EUs were stalled or idle. Consider improving | cache reuse. | Hottest GPU Computing Tasks Bound by GPU L3 Bandwidth Computing Task Total Time -------------- ---------- Matrix1<float> 0.035s Occupancy: 91.1% of peak value Hottest GPU Computing Tasks with Low Occupancy Computing Task Total Time SIMD Width Peak Occupancy(%) Occupancy(%) SIMD Utilization(%) -------------- ---------- ---------- ----------------- ------------ ------------------- Sampler Busy: 0.0% of peak value Hottest GPU Computing Tasks with High Sampler Usage Computing Task Total Time -------------- ---------- Collection and Platform Info Application Command Line: ./matrix.dpcpp Operating System: 5.10.65 DISTRIB_ID=Ubuntu DISTRIB_RELEASE=20.04 DISTRIB_CODENAME=focal DISTRIB_DESCRIPTION="Ubuntu 20.04.3 LTS" Computer Name: glaic3aeon2 Result Size: 28.3 MB Collection start time: 15:39:14 04/01/2022 UTC Collection stop time: 15:39:15 04/01/2022 UTC Collector Type: Event-based sampling driver,Driverless Perf system-wide sampling,User-mode sampling and tracing CPU Name: Intel(R) microarchitecture code named Tigerlake Frequency: 2.803 GHz Logical CPU Count: 8 GPU Name: TigerLake GT2 [Iris Xe Graphics] Vendor: Intel Corporation EU Count: 96 Max EU Thread Count: 7 Max Core Frequency: 1.350 GHz GPU OpenCL Info Version Max Compute Units: 96 Max Work Group Size: 512 Local Memory: 65.5 KB SVM Capabilities If you want to skip descriptions of detected performance issues in the report, enter: vtune -report summary -report-knob show-issues=false -r <my_result_dir>. Alternatively, you may view the report in the csv format: vtune -report <report_name> -format=csv. vtune: Executing actions 100 % done
  6. For a list of the steps that were executed, see
    01_docker_sdk_env/docker_compose/05_tutorials/vtune.tutorial.yml
    .

Troubleshooting

For general robot issues, go to: Troubleshooting for Robot Tutorials.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.