Intel® VTune™ Profiler

User Guide

ID 766319
Date 3/22/2024

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Configure GPU Analysis from Command Line

Use the -knob option for configuring Intel® VTune™ Profilerto profile applications that use a Graphics Processing Unit (GPU) for rendering, video processing, and computations. GPU analysis monitors overall GPU activity (graphics, media, and compute), collects Intel® HD Graphics and Intel® Iris® Graphics hardware metrics, and then shows this data correlated with CPU processes and threads.

The following knobs are supported for GPU analysis:

Knob Name

Supported Analysis Types


enable-gpu-usage=true | false

runss, runsa

Analyze frame rate and usage of Processor Graphics engines.

gpu-counters-mode=none |overview | global-local-accesses | compute-extended | full-compute | render-basic

gpu-hotspots, graphics-rendering, gpu-offload, runss, runsa

Analyze performance data from Processor Graphics based on the GPU Metrics Reference.

  • overview - track general GPU memory accesses such as Memory Read/Write Bandwidth, GPU L3 Misses, Sampler Busy, Sampler Is Bottleneck, and GPU Memory Texture Read Bandwidth. These metrics can be useful for both graphics and compute-intensive applications.

  • global-local-accesses - include metrics that distinguish accessing different types of data on a GPU: Untyped Memory Read/Write Bandwidth, Typed Memory Read/Write Transactions, SLM Read/Write Bandwidth, Render/GPGPU Command Streamer Loaded, and GPU EU Array Usage. This metrics are useful for compute-intensive workloads on the GPU.

  • compute-extended - analyze GPU activity on the Intel processor code name Broadwell. This metrics set is disabled for other systems.

  • full-compute - collect both overview and compute-basic metrics with the allow-multiple-runs option enabled to analyze all types of EUs array stalled/idle issues in the same view.

  • render-basic (preview) - collect Pixel Shader, Vertex Shader, and Output Merger metrics.

This option is available only for supported platforms with the Intel Graphics Driver installed.

gpu-sampling-interval=<value in us>

gpu-hotspots, runss, runsa

Set the interval between GPU samples between 10 and 1000 microseconds. Default is 1000us. An interval of less than 100us is not recommended.

enable-gpu-runtimes=true | false

gpu-hotspots, runss, runsa

Capture the execution time of OpenCL™ kernels and Intel Media SDK programs on a GPU, identify performance-critical GPU computing tasks, and analyze the performance per GPU hardware metrics.


OpenCL kernels analysis is currently supported for Windows and Linux target systems with Intel HD Graphics and Intel Iris Graphics. Intel® Media SDK Program Analysis Configuration is supported for Linux targets only and should be started with root privileges.


Example 1: Running Analysis for an Intel Media SDK Application

This example starts vtune as root and launches the GPU Compute/Media Hotspots analysis for an Intel Media SDK application running on Linux:

vtune  -collect gpu-hotspots -knob enable-gpu-runtimes=true -r quadrant_r001 -- BitonicSort

To analyze a remote Linux target from the Windows system, the same example looks as follows:

vtune -target-system=ssh:user1@ -collect gpu-hotspots -knob enable-gpu-runtimes=true -r quadrant_r001 -- BitonicSort.exe

Example 2: Running Analysis with OpenCL Kernels Tracing

Perform GPU Compute/Media Hotspots or custom analysis, enabling the enable-gpu-usage knob to analyze GPU usage of a processor graphics engine, using the Overview gpu-counters-mode counter set, which is available only on a supported platform with an Intel Graphics Driver installed. Enable tracing of OpenCL kernels execution with the enable-gpu-runtimes option.

For example, to run GPU Compute/Media Hotspots analysis, collect GPU hardware metrics and trace OpenCL kernels on the BitonicSort application (-g is the option of the application), enter:

vtune -collect gpu-hotspots -knob gpu-counters-mode=overview -knob enable-gpu-runtimes=true -- BitonicSort -g

GPU Analysis on Android* System

You can enable GPU analysis for algorithm analysis types on Android systems with Intel HD Graphics and Intel Iris Graphics by using the following knobs:

  • enable-gpu-usage to analyze frame rate and usage of Intel HD Graphics and Intel Iris Graphics engines based on ftrace events

  • gpu-counters-mode to analyze performance data from Intel HD Graphics and Intel Iris Graphics based on the preset counter sets

  • gpu-sampling interval to specify a data collection interval between GPU samples

This example runs the GPU Compute/Media Hotspots analysis and monitors GPU usage.

host>./vtune -collect gpu-hotspots -target-system=android -r quadrant_r001 -target-process -knob enable-gpu-usage=true -knob gpu-counters-mode=overview

See Also