Window: Summary - Hotspots by CPU Utilization
Idle utilization. By default, if the CPU Time is insignificant (less than 50% of 1 CPU), such CPU utilization is classified as idle.
Poor utilization. By default, poor utilization is when the number of simultaneously running CPUs is less than or equal to 50% of the target CPU utilization.
Acceptable (OK) utilization. By default, OK utilization is when the number of simultaneously running CPUs is between 51-85% of the target CPU utilization.
Ideal utilization. By default, Ideal utilization is when the number of simultaneously running CPUs is between 86-100% of the target CPU utilization.
Imbalance or Serial Spinning Time
Imbalance or Serial Spinning time is CPU time when working threads are spinning on a synchronization barrier consuming CPU resources. This can be caused by load imbalance, insufficient concurrency for all working threads or waits on a barrier in the case of serialized execution.
Lock Contention Spin Time
Lock Contention time is CPU time when working threads are spinning on a lock consuming CPU resources. High metric value may signal inefficient parallelization with highly contended synchronization objects. To avoid intensive synchronization, consider using reduction, atomic operations or thread local variables where possible.
Other Spin Time
This metric shows unclassified Spin time spent in a threading runtime library.
Creation Overhead Time
Creation time is CPU time that a runtime library spends on organizing parallel work.
Scheduling Overhead Time
Scheduling time is CPU time that a runtime library spends on work assignment for threads. If the time is significant, consider using coarse-grain work chunking.
Reduction Overhead Time
Reduction time is CPU time that a runtime library spends on loop or region reduction operations.
Atomics Overhead Time
Atomics time is CPU time that a runtime library spends on atomic operations.
Other Overhead Time
This metric shows unclassified Overhead time spent in a threading runtime library.
Effective CPU Utilization Histogram
To Do This
Hover over the bar to identify the amount of Elapsed time the application spent using the specified number of logical CPU cores.
Identify the target CPU utilization. This number is equal to the number of logical CPU cores. Consider this number as your optimization goal.
Average Effective CPU Utilization
Identify the average number of CPUs used aggregating the entire run. It is calculated as CPU time / Elapsed time.
CPU utilization at any point in time cannot surpass the available number of logical CPU cores. Even when the system is oversubscribed, and there are more threads running then CPUs, the CPU utilization is the same as the number of CPUs.
Use this number as a baseline for your performance measurements. The closer this number to the number of logical CPU cores, the better, except for the case when the CPU time goes to spinning.
Utilization Indicator bar
Analyze how the various utilization levels map to the number of simultaneously utilized logical CPU cores.
In the CPU Utilization histogram, the
VTunetreats the Spin and Overhead time as Idle CPU utilization. Different analysis types may recognize Spin and Overhead time differently depending on availability of call stack information. This may result in a difference of CPU utilization graphical representation per analysis type.
Frame Rate Histogram
To Do This
Choose a frame domain to analyze with the frame rate histogram. If only one domain is available, the drop-down menu is grayed out. Then, you can switch to the
Bottom-upwindow grouped by
Frame Domain, filter the data by slow frames and switch to the
Functiongrouping to identify functions in the slow frame domains. Try to optimize your code to keep the frame rate constant (for example, from 30 to 60 frames per second).
Hover over a bar to see the total number of frames in your application executed with a specific frame rate. High number of slow or fast frames signals a performance bottleneck.
Frame rate bar
Use the sliders to adjust the frame rate threshold (in frames per second) for the currently open result and all subsequent results in the project.
Collection and Platform Info
Application Command Line
Path to the target application.
Operating system used for the collection.
Name of the computer used for the collection.
Size of the result collected by the
Collection start time
Collection stop time
Stop time (in UTC format) of the external collection. Explore the
Timelinepane to track the performance statistics provided by the custom collector over time.
Name of the processor used for the collection.
Frequency of the processor used for the collection.
Logical CPU Count
Logical CPU count for the machine used for the collection.
Physical Core Count
Number of physical cores on the system.
Name of the Graphics installed on the system.
Version of the graphics driver installed on the system.
Number of execution units (EUs) in the
Render and GPGPUengine. This data is Intel® HD Graphics and Intel® Iris® Graphics (further: Intel Graphics) specific.
Max EU Thread Count
Maximum number of threads per execution unit. This data is Intel Graphics specific.
Max Core Frequency
Maximum frequency of the Graphics processor. This data is Intel Graphics specific.
Graphics Performance Analysis
GPU metrics collection is enabled on the hardware level. This data is Intel Graphics specific.
Some systems disable collection of extended metrics such as L3 misses, memory accesses, sampler busyness, SLM accesses, and others in the BIOS. On some systems you can set a BIOS option to enable this collection. The presence or absence of the option and its name are BIOS vendor specific. Look for the
Intel® Graphics Performance Analyzersoption (or similar) in your BIOS and set it to