Intel® VTune™ Profiler Performance Analysis Cookbook

ID 766316
Date 3/22/2024
Public
Document Table of Contents

Profiling Docker* Containers

Intel® VTune™ Profiler allows you to profile applications running in Docker* containers, including profiling multiple containers simultaneously. This recipe guides you through the configuration of a Docker container and describes ways to use VTune Profiler to analyze one or multiple concurrently running containers. This recipe also utilizes the Java* Code Analysis capabilities of VTune Profiler.

Ingredients

This section lists the hardware and software tools used for the performance analysis scenario.

  • Application: MatrixMultiplication. This Java application is used as a demo and is not available for download.

  • Tools: Intel VTune Profiler 2021.2.0 - Hotspots analysis with Hardware Event-Based Sampling.

    NOTE:
    • Starting with the 2020 release, Intel® VTune™ Amplifier has been renamed to Intel® VTune™ Profiler.

    • Most recipes in the Intel® VTune™ Profiler Performance Analysis Cookbook are flexible. You can apply them to different versions of Intel® VTune™ Profiler. In some cases, minor adjustments may be required.

    • Get the latest version of Intel® VTune™ Profiler:

  • Linux container runtime: docker.io.

  • Operating system: Ubuntu* 20.04 based on Linux* kernel version 5.4 or newer.

  • CPU: Intel® microarchitecture code named Skylake or newer

Install and Configure a Docker* Container

Prerequisites:

  1. Pull a docker image that you want to use.
    host> docker pull ubuntu:latest
  2. Run the container. Keep it running using the -t and -d options.
    host> docker run -td ubuntu:latest
  3. To analyze Docker containers with VTune Profiler using User-Mode Sampling or Memory Consumption analyses, make sure to enable ptrace support.
    host> docker run --cap-add CAP_SYS_PTRACE --name=test_container -td ubuntu:latest
  4. If you want to analyze Docker containers with VTune Profiler using Hardware Event-Based Sampling analysis, enable the CAP_SYS_ADMIN capability.
    host> docker run --cap-add CAP_SYS_ADMIN --name=test_container_0 -td ubuntu:latest
    You can also launch the container in the privileged mode.
    host> docker run –-privileged --name=test_container_0 -td ubuntu:latest
  5. Copy your Java application with the Java Virtual Machine (JVM) to the docker instance that is running.
    host> docker cp openjdk-16_linux-x64_bin.tar.gz test_container_0:/var/local
    host> docker cp MatrixMultiplication.java test_container_0:/var/local
    
  6. Use the container name to get bash into this container in the background mode.
    host> docker exec -it test_container_0 /bin/bash
  7. Extract the jdk archive.

Run Hotspots Analysis with Hardware Event-Based Sampling for Target in Container

In this procedure, we run VTune Profiler on the host machine to profile a target in a docker container.

  1. Run the Java application in the container.

    container> cd /var/local
    container> /var/local/jdk-16/bin/java -cp . MatrixMultiplication 2000 2000 2000 2000
    
  2. On the host, run a system-wide analysis by starting the Profile System analysis:
    host> cd /home/user/intel/oneapi/vtune/latest
    host> source vtune-vars.sh
    host> vtune -collect hotspots -knob sampling-mode=hw -knob enable-stack-collection=true -knob stack-size=4096 --duration 60
    
    NOTE:
    You can also profile your application when it is running in a Docker container, using the Attach to Process target type.
    host> vtune -collect hotspots -knob sampling-mode=hw -knob enable-stack-collection=true -knob stack-size=4096 –target-process java
    However, you cannot profile applications running in the container that are instrumented with ITT/JIT API.

Analyze Data Collected for Target in Container

  1. When the data collection is complete, start the VTune Profiler GUI.
    host> vtune-gui
  2. Create a project for the collected results, say docker_java.
  3. Open the collected results.
  4. Review the results in the Summary tab of the Hotspots analysis.
    We infer from the Top Hotspots section that the multiply function of the target application consumed the most CPU time.
  5. Click on the multiply function in the list to switch to the Bottom-up tab. View the stack flow for this hotspot function.
  6. Double click on the hottest function in the table to identify the hotspot source code line for that function. You can then analyze the metric data collected for this line.
  7. To view performance data for individual docker containers, select the Container Name/Process/Function/Thread/Call Stack grouping from the pull down menu. Identify containers by the docker prefix.
  8. To view the performance data for system binaries that are running inside a container, select the Module/Function/Call Stack grouping. Locate the entries with the docker prefix in the Module Path column.
    In this grouping mode, you can also view performance data for host system binaries and containerized system binaries simultaneously.

Run Hardware Event-Based Hotspots Analysis With VTune Profiler and Target Running in Same Container

  1. Pull the docker image of oneapi-basekit.
    host> docker pull intel/oneapi-basekit
  2. Run the docker container with CAP_SYS_ADMIN capability to enable profiling from the container.
    host> docker run -dt --name=my_oneapi_container --cap-add CAP_SYS_ADMIN intel/oneapi-basekit
  3. Once the collection is complete, do one of the following: you can either or .
    • Copy and view the collected data outside the container. Exit this procedure.
    • Use VTune Profiler Server opened in the same container. Go to step 4.
  4. Use VTune Profiler Server to view collected data.
    1. Publish a port outside the container by using --publish.
      host> docker run -dt --name=my_oneapi_container --cap-add CAP_SYS_ADMIN --publish 	7788:7788 intel/oneapi-basekit
      where

      --publish 7788:7788 maps TCP port 7788 in the container to port 7788 on the host.

    2. Start VTune Profiler Server inside the container.
      my_oneapi_container> vtune-backend --allow-remote-ui --web-port=7788 --enable-		server-profiling &
      where

      --allow-remote-ui allows remote UI clients.

      --web-port=7788 is the HTTP/HTTPS port for web UI and data APIs.

      --enable-server-profiling allows users to select the hosting server as the profiling target.

      & runs the command in the background.

      The vtune-backend command returns a URL that you can open outside the container. For example,

      Serving GUI at https://b06036cef42c:7788?one-time-token=4db58f1ad7225e4dccca60573e4c1fd2
      Serving GUI at https://172.17.0.8:7788?one-time-token=4db58f1ad7225e4dccca60573e4c1fd2
      

    3. On the host machine, open the URL reported by vtune-backend in a browser.
    4. Change the port on the container (used by vtune-backend) to the port you specified when creating the container.
      NOTE:
      The IP address in this output is the IP address of the container. You can access this address only from the host where the container is running. To access VTune Profiler Server from outside the host, use the IP address or hostname of the external host.
    5. Create a project, say vtune_in_docker.
    6. Copy your Java application to the host folder in the container or mount application.
      host> docker cp openjdk-16_linux-x64_bin.tar.gz my_oneapi_container:/var/local/
      host> docker cp MatrixMultiplication.java my_oneapi_container:/var/local/
      
    7. You can run User-mode and Hardware Event-based Hotspots analysis in both Launch and Attach to Process modes within the container. For example, start the Hardware Event-based Hotspots analysis in Launch mode.
    8. Once the analysis finishes, view results in the Summary tab.

Run Profile System Analysis for Host Target From Container

  1. Start a docker container with --pid=host and --cap-add CAP_SYS_ADMIN options to collect data and with mounted host folders. Specify the binaries and symbols of the application for Function and Source level analysis of collected data.
    host> docker run -dt --name=my_oneapi_container -v /host_path:/container_path --pid=host --cap-add CAP_SYS_ADMIN --publish 7788:7788 intel/oneapi-basekit
    where:

    -v /host_path:/container_path mounts the host path "/host_path" inside the container path "/container_path".

    --pid=host sets the PID namespace of the host inside the container.

    --publish 7788:7788 maps the TCP port 7788 in the container to port 7788 on the host.

  2. To analyze the collected data, do one of the following:
    • Copy and view the collected data outside the container and on a different system. Exit this procedure.
    • Use VTune Profiler Server opened in the same container. Go to step 3.
  3. To view results in VTune Profiler Server, start the server inside the container.
    my_oneapi_container> vtune-backend --allow-remote-ui --web-port=7788 --enable-server-profiling &
    where

    --allow-remote-ui allows remote UI clients

    --web-port=7788 sets the HTTP/HTTPS port for web UI and data APIs

    --enable-server-profiling allows users to select the hosting server as the profiling target

    & runs the command in the background

    The vtune-backend command returns a URL that you can open outside the container. For example,
    Serving GUI at https://b06036cef42c:7788?one-time-token=4db58f1ad7225e4dccca60573e4c1fd2
    Serving GUI at https://172.17.0.8:7788?one-time-token=4db58f1ad7225e4dccca60573e4c1fd2
    
  4. On the host machine, use a browser to open the URL reported by vtune-backend. Change the port of the container used by vtune-backend to the port you specified when creating the container.
    NOTE:
    The IP address in this output is the IP address of the container. You can access it only from the host where the container is running. To access vtune-server from outside of the host, use IP address or hostname of the external host.
  5. Create a project, say vtune_in_docker.
  6. In the container, run Hardware Event-based Hotspots in the Profile System mode.
  7. Specify the locations of source and binaries to enable source-level and function-level analysis.
  8. Start a command line collection from the container by specifying binary and search directories with the -search-dir and -source-search-dir options . To access the results from the GUI, point -result-dir to the current location of the VTune project.
    my_oneapi_container> vtune -collect hotspots -knob sampling-mode=hw -knob stack-size=4096 --duration 30  -result-dir=/root/intel/vtune/projects/vtune_in_docker/r@@@{at} -search-dir /container_path -source-search-dir /container_path
    NOTE:
    Use the Attach to Process mode to profile your application running in a docker container.
    my_oneapi_container> docker exec my_oneapi_container vtune -collect hotspots -knob sampling-mode=hw  -result-dir=/root/intel/vtune/projects/vtune_in_docker/r@@@{at}  -search-dir /container_path -source-search-dir /container_path -target-process matrix
  9. Once the analysis completes, see results in the Summary tab.
    The Top Hotspots section of the Summary view shows that the multiply function of the target application consumed the most CPU time.
  10. Click the multiply function in the list and switch to the Bottom-up tab.
  11. Double click on the test_if function to examine the source level analysis for this function.
NOTE:
You can have source-level analyses for native applications that run simultaneously in multiple containers, if all of these containers have the same mounted host folder with the binaries.
Profiling Considerations:
  • You can only profile C/C++ applications.
  • You cannot profile applications instrumented with ITT/JIT API.
NOTE:

Discuss this recipe in the Analyzers developer forum.