Hardware Event-based Sampling Collection
During the hardware event-based sampling (EBS), also known as Performance Monitoring Counter (PMC) analysis in the sampling mode, the
Intel® VTune™
profiles your application using the counter overflow feature of the Performance Monitoring Unit (PMU).
Profiler
The data collector interrupts a process and captures the IP of interrupted process at the time of the interrupt. Statistically collected IPs of active processes enable the viewer to show statistically important code regions that affect software performance.
Statistical sampling does not provide 100% accurate data. When the
VTune
collects an event, it attributes not only that event but the entire
sampling interval prior to it (often 10,000 to 2,000,000 events) to the current code context. For a big number of samples, this sampling error does not have a serious impact on the accuracy of performance analysis and the final statistical picture is still valid. But if something happened for very little time, then very few samples will exist for it. This may yield seemingly impossible results, such as two million instructions retiring in 0 cycles for a rarely-seen driver. In this case, you may either ignore hotspots showing an insignificant number of samples or switch to a higher granularity (for example, function).
Profiler
The average overhead of event-based sampling is about 2% on a 1ms sampling interval.
The number of hardware events (Performance Monitoring Counters) that can be collected simultaneously is limited by CPU capabilities. Usually, it is no more than four events. To overcome this limitation, the
VTune
splits the event list into several event groups. Each group consists of events that can be collected simultaneously.
Profiler
VTune
uses one of the following techniques:
Profiler
- Runs an application several times collecting one event group during each run.
- Runs an application only once and multiplexes the event groups in a round robin fashion during the run. This technique may not work on some OS/hardware combinations.
During product installation on Linux*, you have an option to install the sampling driver with the per-user filtering enabled. When the filtering is on, the collector gathers data only for the processes spawned by the user who started the collection. When it is off (default), samples from all processes on the system are collected. Consider using the filtering to isolate the collection from other users on a cluster for security reasons. The administrator/root can change the filtering mode by rebuilding/restarting the driver at any time. A regular user cannot change the mode after the product is installed.
By default, the
VTune
collector samples your target and does not analyze execution paths. But you can enable the
Profiler
Collect stacks
option during analysis configuration to make the collector take exact measurements of any hardware performance events or timestamps, as well as collect a call stack to the point where a thread gets activated and inactivated. On Linux* systems, by default,
VTune
uses the
driverless Perf collection mode for the hardware event-based stack analysis.
Profiler
VTune
uses the hardware event-based sampling collector to collect data for the following analysis types:
Profiler
- Hotspots (hardware event-based samplingmode)
- GPU Compute/Media Hotspots (preview)
- GPU Offload (preview)
- Input and Output (preview on Windows* host)
- CPU/FPGA Interaction (preview)
This is a
PREVIEW FEATURE
. A preview feature may or may not appear in a future production release. It is available for your use in the hopes that you will provide feedback on its usefulness and help determine its future. Data collected with a preview feature is not guaranteed to be backward compatible with future releases.
You can also
create a custom analysis type based on the hardware event-based sampling collection.
Analysis types that use the hardware event-based sampling collector are limited to only one collection allowed at a time on a system.
Prerequisites:
It is recommended to
install the sampling driver for hardware event-based sampling collection types. For Linux* and Android* targets, if the sampling driver is not installed,
VTune
can enable the Perf* driverless collection. Be aware of the following configuration settings for Linux target systems:
Profiler
- To enable system-wide and uncore event collection, use root or sudo to set/proc/sys/kernel/perf_event_paranoidto0.echo 0>/proc/sys/kernel/perf_event_paranoid
- To enable collection with the Microarchitecture Exploration analysis type, increase the default limit of opened file descriptors. Use root or sudo to increase the default value in/etc/security/limits.confto100*.<number_of_logical_CPU_cores><user>hard nofile<100 * number_of_logic_CPU_cores><user>soft nofile<100 * number_of_logic_CPU_cores>