Intel® VTune™ Profiler

User Guide

ID 766319
Date 3/22/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Pane: Timeline

Use the Timeline pane to visualize metrics over time at either the thread level or platform level and identify patterns, anomalies, and trends in the data.

You can hover, zoom-in, and filter the data at interesting points in time to get more detail. Typically the Timeline pane is located at the bottom of the window but for the views focused on the metrics distribution over time, it may occupy the upper or central part of the window. Data presented in the Timeline pane varies depending on the analysis type and viewpoint.

The Timeline pane typically provides the following data:

Toolbar. Navigation control to zoom in/out the view on areas of interest. For more details on the Timeline controls, see Managing Timeline View topic.

Platform metrics. Depending on the analysis type, the Timeline pane may present several areas with platform specific metrics such as GPU engine usage, computing queue for OpenCL™ applications, bandwidth data, power consumption, and so on. The most detailed analysis of the platform metrics is available with the Timeline pane in the Platform window.

Application metrics per grouping level. Depending on the viewpoint, the data may be represented by threads, modules, processes, cores, packages, and other units monitored by the data collector during the analysis run. For most of the viewpoints, the Thread grouping is default. For some viewpoints, you may change the grouping level using the drop-down menu in the Legend area.

Note that the CPU Time metric value provided in the Thread area is applicable to a particular thread where 100% is the maximum possible utilization for a thread. For example, for the selection above 94.2% of CPU Time utilization means that the thread was active 94.2% of time and 5.8% it was waiting.

Selected metrics. Data on the most representative metrics may be presented as separate rows demonstrating an overall application performance over time (for example, CPU Usage or GPU HW metrics) or system-wide execution (for example, GPU Usage). See Reference for Performance Metrics for detailed metrics description.

Note that the CPU Utilization metric in the Timeline pane is calculated as a sum of CPU time per each thread where 100% is the maximum possible utilization per CPU. For example, at the moment selected in the picture below the application utilized 1.91 of logical CPU cores (if every CPU is 100%, then 191% is 1.91) out of 4, and 0.23 of CPU was used by the application threads for overhead or spinning. This means that the application utilized only 1.68 of CPUs effectively.

Legend. Types of data presented on the timeline. Filter in/out any type of data presented in the timeline by selecting/deselecting corresponding check boxes. The list of performance metrics presented in the view depend on the selected analysis type and viewpoint.

VTune Profiler also uses special indicators to classify the presented data on the timeline:

  • Markers. Color markers indicate an area on the timeline when a particular task/ frame/event/etc. was executed. Hover over a marker to see the execution details for the selected element. The following markers are available:

    • Frame markers show frame duration. Available for applications using frames.

    • User task markers provide information on a task executed at this particular moment of time. Available for applications using Task API.

    • CPU sample markers indicate exact points where profiling samples happened during hardware event-based stack sampling collection. Use the markers density to estimate the data resolution. For example, the VTune Profiler interpolates the sampling data where accuracy depends on number of samples. In this case, the CPU Samples markers show more accurate information discovering the sporadic CPU utilization for the thread.

      Sample markers also help understand how exactly filtering and Spin/Overhead time calculation works. VTune Profiler filters or classifies samples as a whole, so when you do time filtering it is important to know whether the sample point got into the selected time interval or not. No data interpolation is done for sampling data when filtering or classifying sample metrics.

    • VSync markers for vertical synchronization. If your application uses vertical synchronization, you can select the VSync timeline option, estimate the correlation between VSync events and application frames, identify frames missing VSync events and explore possible reasons.

    • Sampling point markers point at which a data sample was read during energy analysis. Hovering over it gives the value(s) read at that time.

    • Wake-up object markers for energy analysis that show processor wake-ups on the timeline. Hover over a yellow marker to see the time when the selected wake-up happened and the name of the wake-up object.

    • Slow tasks markers show the duration of tasks (I/O Wait, Ftrace*, Atrace*, and so on) that is categorized as slow (according to the thresholds set up in the Summary window)

    • I/O APIs markers

  • Context switches. The time threads are spending on context switches. Hover over a context switch area to see the details on its duration, reason, and affected CPU. If you choose the Context Switch Time option in the Call Stack pane and select a context switch in the Timeline pane, the Call Stack pane shows a call sequence at which a preceding thread execution quantum was interrupted.

  • Transitions. The execution flow between threads where one thread signals to another thread waiting to receive that signal. For example, one thread attempts to acquire a lock held by another thread, which then releases it. The release acts like a signal to the waiting thread. Hover over a transition for more details. Double-click a transition to open the source code.

  • Memory transfers. OpenCL routines responsible for transferring data from the host system to a GPU are marked with cross-diagonal hatching on a computing queue:

  • Synchronizations. OpenCL routines responsible for synchronization are marked with vertical hatching on a computing queue:

  • Scaling indicators. For GPU metrics and bandwidth graphs, the VTune Profiler provides maximum Y-axis values used to scale the graphs. Color of such a value corresponds to the color of the relevant metric in the legend. For example, for the GPU L3 Cache Misses and Memory Access metrics, maximum Y value for the selected scale is 20.153 GB/sec for GPU Memory Read Bandwidth and for the GPU Memory Write Bandwidth, and 521849224.729 Misses/sec for GPU L3 Misses.

Tooltips. Hover over a chart element to get statistics on this metric/program unit for the selected moment of time.

For the GPU analysis of applications using OpenCL software technology, the Timeline pane in the Graphics window provides the following tabs:

  • Platform tab that focuses on a per-thread and per-process distribution of the CPU and GPU hardware metrics collected during the analysis run.

  • Architecture Diagram tab that is provided for OpenCL application analysis collected with the Analyze Processor Graphics hardware events option on systems with Intel® HD Graphics and Intel® Iris® Graphics. This tabs helps better understand the distribution of the GPU hardware metrics per architecture blocks for the period the selected OpenCL kernel was running.

NOTE:

Collecting energy analysis data with Intel® SoC Watch is available for target Android*, Windows*, or Linux* devices. Import and viewing of the Intel SoC Watch results is supported with any version of the VTune Profiler.