User Guide

Intel® VTune™ Profiler User Guide

ID 766319
Date 11/07/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Switch Viewpoints

Use a viewpoint, a pre-set configuration of Intel® VTune™ Profiler's data views, to focus on specific performance problems.

NOTE:

By default, VTune Profiler shows no viewpoints, or a managed selection of viewpoints that may be helpful for the specific analysis type. You can enable the display of all applicable viewpoints by enabling the Show all applicable viewpoints option in the Options pane.

When you select a viewpoint, you select a set of performance metrics the Intel® VTune™ Profiler shows in the windows of the result tab. To select the required viewpoint, click the down arrow:

Name of the analysis type you ran.

Name of the current viewpoint. Click the down arrow next to the viewpoint name to open a drop-down menu with a choice of applicable viewpoints.

Context-sensitive help icon for the current viewpoint.

Viewpoint drop-down menu that displays a list of viewpoints available for the current analysis type.

Explore the table below to understand which viewpoints are available for each analysis type:

Viewpoint

Description

Hotspots by CPU Utilization

Helps identify hotspots - code regions in the application that consume a lot of CPU time. CPU time is broken down into CPU utilization states: idle, poor, fair, and good.

Threading Efficiency

Shows how your multi-threaded application is utilizing available CPU cores and helps identify the possible causes of ineffective utilization. Use this view to find threads waiting too long on synchronization objects (locks) or identify scheduling overhead.

Microarchitecture Exploration

Helps identify where the application is not making the best use of available hardware resources. This viewpoint displays metrics derived from hardware events. The Summary window reports overall metrics for the entire execution along with explanations of the metrics. From the Bottom-up and Top-down Tree windows you can locate the hardware issues in your application. Cells are highlighted when potential opportunities to improve performance are detected. Hover over the highlighted metrics in the grid to see explanations of the issues.

Hardware Events

Displays statistics of monitored hardware events: estimated count and/or the number of samples collected. Use this view to identify code regions (modules, functions, code lines, and so on) with the highest activity for an event of interest.

Memory Usage

Helps understand how effectively your application uses memory resources and identify potential memory access related issues like excessive access to remote memory on NUMA platforms, hitting DRAM or Interconnect bandwidth limit, and others. It provides various performance metrics for both the application code and memory objects arrays.

HPC Performance Characterization

Helps understand how effectively your application uses CPU, memory, and floating-point operation resources. Use this view to identify scalability issues for Intel OpenMP and MPI runtimes as well as next steps to increase memory and FPU efficiency.

Input and Output

Shows input/output data, CPU and bus utilization statistics correlated with the execution of your target. Use this view to identify long latency of I/O requests, explore call stacks for I/O functions, analyze slow I/O requests on the timeline and identify imbalance between I/O and compute operations.

GPU Compute/Media Hotspots

Helps identify GPU tasks with high GPU utilization and estimate its effectiveness. It is particularly useful for SYCL computing tasks, analysis of the OpenCL™ kernels and Intel Media SDK tasks. Use this view to identify the most time-consuming GPU computing tasks, analyze GPU tasks execution over time, explore the GPU hardware metrics per GPU architecture blocks, and so on.

FPGA Hotspots

Helps identify the FPGA and CPU tasks with high utilization. Use this view to assess FPGA time spent executing kernels, overall time for memory transfers between the CPU and FPGA, and how well a workload is balanced between the CPU and FPGA.

GPU Rendering

Provides platform-wide CPU/GPU utilization and efficiency statistics collected with GPU Rendering analysis (preview) including dedicated support for the Xen virtualization platform.

Platform Power Analysis

Helps identify where the application is generating idle and wake-up behavior that can lead to inefficient use of energy. Where possible, it provides data from both the OS and hardware perspective, such as the detailed C-state residency report that shows the OS requested time in deep sleep states compared to the actual residency the hardware indicated.