User Guide


Custom Analysis Options

If you create a copy of a predefined analysis type, a new custom configuration inherits all options available for the original analysis and makes them editable.
This is a list of all available custom configuration options (knobs) in the alphabetical order:


Analyze I/O waits
check box
Analyze the percentage of time each thread and CPU spends in I/O wait state.
Analyze interrupts
check box
Collect interrupt events that alter a normal execution flow of a program. Such events can be generated by hardware devices or by CPUs. Use this data to identify slow interrupts that affect your code performance.
Analyze loops
check box
Extend loops analysis to collect advanced loops information, such as instructions set usage and display analysis results by loops and functions.
Analyze memory bandwidth
check box
Collect events required to compute memory bandwidth.
Analyze memory consumption
check box (for Linux targets only)
Collect and analyze information about memory objects with the highest memory consumption.
Analyze memory objects
check box (for Linux* targets only)
Enable the instrumentation of memory allocation/de-allocation and map hardware events to memory objects.
Analyze OpenMP regions
check box
Instrument the OpenMP* regions in your application to group performance data by regions/work-sharing constructs and detect inefficiencies such as imbalance, lock contention, or overhead on performing scheduling, reduction, and atomic operations. Using this option may cause higher overhead and increase the result size.
Analyze PCIe bandwidth
check box
Collect the events required to compute PCIe bandwidth. As a result, you will be able to analyze the distribution of the read/write operations on the timeline and identify where your application could be stalled due to approaching the bandwidth limits of the PCIe bus.
In the
Device class
drop-down menu, you can choose a device class where you need to analyze PCIe bandwidth: processing accelerators, mass storage controller, network controller, or all classes of the devices (default).
This analysis is possible only on the Intel microarchitecture code name Sandy Bridge EP and later.
Analyze power usage
check box
Track power consumption by processor over time to see whether it can cause CPU throttling.
Analyze Processor Graphics hardware events
drop-down menu
Analyze performance data from Intel HD Graphics and Intel Iris Graphics (further: Intel Graphics) based on the predefined groups of GPU metrics.
Analyze system-wide context switches
check box
Analyze detailed scheduling layout for all threads on the system and identify the nature of context switches for a thread (preemption or synchronization).
Analyze user tasks, events, and counters
check box
Analyze tasks, events, and counters specified in your code via the ITT API. This option causes a higher overhead and increases the result size.
Analyze user histogram
check box
Analyze the histogram specified in your code via the Histogram API. This option increases both overhead and result size.
Analyze user synchronization
check box
Enable User synchronization API profiling to analyze thread synchronization. This option causes higher overhead and increases result size.
Chipset events
Specify a comma-separated list of chipset events (up to 5 events) to monitor with the hardware event-based sampling collector.
Collect context switches
check box
Analyze detailed scheduling layout for all threads in your application, explore time spent on a context switch and identify the nature of context switches for a thread (preemption or synchronization).
The types of the context switches (preemption or synchronization) cannot be identified if the analysis uses Perf* based driverless collection.
Collect CPU sampling data
Choose whether to collect information about CPU samples and related call stacks.
Collect highly accurate CPU time
check box (for Windows targets only)
Obtain more accurate CPU time data. This option causes more runtime overhead and increases result size. Administrator privileges are required.
Collect I/O API data
Choose whether to collect information about I/O calls and related call stacks. This analysis option helps identify where threads are waiting or enables you to compute thread concurrency. The collector instruments APIs, which causes higher overhead and increases result size.
Collect Parallel File System counters
check box
Enable collection of the Parallel File System counters to analyze Lustre* file system performance statistics, including Bandwidth, Package Rate, Average Packet Size, and others.
Collect signalling API data
Choose whether to collect information about synchronization objects and call stacks for signaling calls. This analysis option helps identify synchronization transitions in the timeline and signalling call stacks for associated waits. The collector instruments signalling APIs, which causes higher overhead and increases result size.
Collect stacks
check box
Enable advanced collection of call stacks and thread context switches to analyze performance, parallelism, and power consumption per execution path.
Collect synchronization API data
Choose whether to collect information about synchronization wait calls and related call stacks. This analysis option helps identify where threads are waiting or enables you to compute thread concurrency. The collector instruments APIs, which causes higher overhead and increases result size.
Collect thread affinity
check box
Analyze thread pinning to sockets, physical cores, and logical cores. Identify incorrect affinity that utilizes logical cores instead of physical cores and contributes to poor physical CPU utilization.
Affinity information is collected at the end of the thread lifetime, so the resulting data may not show the whole issue for dynamic affinity that is changed during the thread lifetime.
CPU Events
  • Specify hardware events to collect using the check boxes in the first column. By default, the table lists all events available for the target platform with events used for the original analysis configuration pre-selected. You may use the
    functionality to find events of interest. To get more details on an event, select it in the table and click the
  • Modify the Sample After value for an event to control the number of events after which the
    interrupts the event data collection. The
    Sample After
    value depends on the target duration. Based on the duration value, the
    adjusts the
    Sample After
    value with a multiplier.
CPU sampling interval, ms
Specify an interval between collected CPU samples in milliseconds.
Disable alternative stacks for signal handlers
check box (available for Linux targets)
Disable using alternative stacks for signal handlers. Consider this option for profiling standard Python 3 code on Linux.
Enable driverless collection
check box
Evaluate max DRAM bandwidth
check box
Evaluate maximum achievable local DRAM bandwidth before the collection starts. This data is used to scale bandwidth metrics on the timeline and calculate thresholds.
Event mode
drop-down list
Limit event-based sampling collection to USER (user events) or OS(system events) mode. By default, all event types are collected.
GPU Profiling mode
drop-down menu
Select a profiling mode to either characterize GPU performance issues based on GPU hardware metric presets or enable a source analysis to identify basic blocks latency due to algorithm inefficiencies, or memory latency due to memory access issues.
Use the
Computing task of interest
table to specify the kernels of interest and narrow down the GPU analysis to specific kernels minimizing the collection overhead. If required, modify the instance step for each kernel, which is a sampling interval (in the number of kernels).
GPU sampling interval, ms
Specify an interval between GPU samples.
GPU Utilization
check box (for Linux* targets available with Intel HD Graphics and Intel Iris® Graphics only)
Analyze GPU usage and identify whether your application is GPU or CPU bound.
Limit PMU collection to counting
check box
Enable to collect counts of events instead of default detailed context data for each PMU event (such as code or hardware context). Counting mode introduces less overhead but gives less information.
Linux Ftrace events
Android framework events
Use the kernel events library to select Linux Ftrace* and Android* framework events to monitor with the collector. The collected data show up as tasks in the Timeline pane. You can also apply the task grouping level to view performance statistics in the grid.
Managed runtime type to analyze
Choose a type of the managed runtime to analyze. Available options are:
  • for Windows targets: combined Java* and .NET* analysis; combined Java, .NET and Python* analysis; Python only analysis
  • for Linux targets: Java only analysis; combined Java and Python analysis; Python only analysis
Minimal memory object size to track, in bytes
spin box (for Linux targets only)
Specify a minimal size of memory allocations to analyze. This option helps reduce runtime overhead of the instrumentation.
Profile with Hardware Tracing
check box
Enable driver-less hardware tracing collection to explore CPU activities of your code at the microsecond level and triage latency issues.
Stack size, in bytes
Specify the size of a raw stack (in bytes) to process.
size value in GUI corresponds to 0 value in the command line. Possible values are numbers between 0 and 2147483647.
Stack type
drop-down menu
Choose between software stack and hardware LBR-based stack types. Software stacks have no depth limitations and provide more data while hardware stacks introduce less overhead. Typically, software stack type is recommended unless the collection overhead becomes significant. Note that hardware LBR stack type may not be available on all platforms.
Stack unwinding mode
Choose whether collection requires online (during collection) or offline (after collection) stack unwinding. Offline mode reduces analysis overhead and is typically recommended.
Stitch stacks
check box
For applications using
Intel® oneAPI Threading Building Blocks
) or OpenMP* with Intel runtime libraries, restructure the call flow to attach stacks to a point introducing a parallel workload.
Trace GPU Programming APIs
check box
Capture the execution time of OpenCL™ kernels, SYCL tasks and Intel Media SDK programs on a GPU, identify performance-critical GPU tasks, and analyze the performance per GPU hardware metrics.
Uncore sampling interval, ms
Specify an interval (in milliseconds) between uncore event samples.
Use precise multiplexing
check box
Enable a fine-grain event multiplexing mode that switches events groups on each sample. This mode provides more reliable statistics for applications with a short execution time. You can also consider applying the precise multiplexing algorithm if the MUX Reliability metric value for your results is low.
You may generate the command line for this configuration using the
Command Line...
button at the bottom.

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at