Identify program units that took the most CPU time. These are recognized as hotspots. The Hotspots viewpoint is available for all analysis results.
Follow these steps to interpret performance data available in the Hotspots viewpoint:
Define a Performance Baseline
Start your analysis in the Summary window. Here you see general information about the execution of your application. Note that the Elapsed time is different from the application CPU time. The Elapsed time is the application time from start to termination. The application CPU time is the sum of the active processor time for all the threads that run the application. It does not include waiting times.
Use the Elapsed time value as a baseline to compare versions before and after optimization. When tuning the application, as you add more threads, the Elapsed time tends to decrease whereas the CPU time may increase.
If you ran the Hotspots analysis in the hardware event-based sampling mode, the analysis metrics in the Summary window display the Microarchitecture Usage metric. Use this metric to estimate the code efficiency on your hardware platform:
If this metric value is flagged as critical, consider running the Microarchitecture Exploration analysis to dive deeper into hardware metrics.
Identify the Hottest Function
Get a list of the most time-consuming functions in the Top Hotspots section of the Summary window. Click on a hotspot function to explore its call flow and other related metrics in the Bottom-up view.
By default, the Bottom-up view presents a sorted display of CPU Time in descending order, starting with the most time-consuming functions. Start optimizing the functions with the largest CPU time.
Expand the CPU Time column to get more details on how effectively the CPU time was used:
Next, focus your tuning efforts on the program units with the largest Poor value. This means that your application underutilized the CPU time during the execution of these program units. The overall goal of optimization is to achieve Ideal (green ) or OK (orange ) CPU utilization state and shorten the Poor and Over CPU utilization values.
Switch to the Flame Graph window to quickly identify the hottest code paths in your application. Analyze the CPU time spent on each program unit and its related callee functions.
The flame graph plots stack profile population (sorted alphabetically) on the horizontal axis. The vertical axis shows stack depth, starting from zero at the bottom. The width of each element in the flame graph indicates the percentage of CPU time of the function (and its callees) to the total CPU time.
Identify Algorithm Issues
If you identify issues with the calling sequences in your application, you can improve performance by revising the order in which functions are called. Use these methods:
Top-down Tree pane: Analyze the Total and Self time data for callers and callees of the hotspot function to understand whether this time can be optimized.
Call Stack pane: Identify the highest contributing stack for the program unit(s) selected in the Bottom-up or Top-down Tree panes. Use the navigation buttons to see the different stacks that called the selected program unit(s). The contribution bar shows the contribution of the currently visible stack to the overall time spent by the selected program unit(s). You can also use the drop-down list in the Call Stack pane to view data for different types of stacks.
Stack data is available by default for the user-mode sampling mode. To have this data for the hardware event-based sampling mode, you need to enable the Collect stacks option in the Hotspots analysis configuration.
Double-click the hottest function to view its related source code in the Source/Assembly window. Open the code editor directly from Intel® VTune™ Profiler and improve your code (for example, minimizing the number of calls to the hotspot function).
If you ran the analysis with the default Show additional performance insights option, the Summary view will include the Insights section that provides additional metrics for your target such as efficiency of the hardware usage and vectorization. This information helps you identify potential next steps for your performance analysis and understand where you could focus your optimization efforts.