Visible to Intel only — GUID: GUID-FCBDC49D-9688-4543-B836-BB6340E5C091
Visible to Intel only — GUID: GUID-FCBDC49D-9688-4543-B836-BB6340E5C091
Summary Report
Similar to the Summary window, available in GUI, the summary report provides overall performance data of your target. Intel® VTune™ Profiler automatically generates the summary report when data collection completes. To disable this report, use the no-summary option in your command when performing a collect or collect-with action.
Use the following syntax to generate the Summary report from a preexisting result:
vtune -report summary -result-dir <result_path>
The summary report output depends on the collection type:
User-mode Sampling and Tracing Collection Summary Report
For User-Mode Sampling and Tracing Collection results, the summary report includes the following sections:
Collection and Platform Information
CPU Information
Summary per basic analysis metrics
Example 1: User-Mode Sampling Hotspots Summary
This example generates the summary report for the r000hs Hotspots analysis result on Windows*:
vtune -report summary -r r000hs
Elapsed Time: 1.857s
CPU Time: 10.069s
Effective Time: 10.069s
Idle: 0.000s
Poor: 1.294s
Ok: 6.381s
Ideal: 2.395s
Over: 0s
Spin Time: 0s
Overhead Time: 0s
Total Thread Count: 9
Paused Time: 0s
Top Hotspots
Function Module CPU Time
--------- ---------- --------
multiply1 matrix.exe 10.069s
Collection and Platform Info
Application Command Line: C:\temp\samples\en\C++\matrix_vtune\matrix\vc14\Win32\Release\matrix.exe
Operating System: Microsoft Windows 10
Computer Name: my-computer
Result Size: 5 MB
Collection start time: 09:41:57 06/09/2018 UTC
Collection stop time: 09:41:58 06/09/2018 UTC
Collector Type: Event-based counting driver,User-mode sampling and tracing
CPU
Name: Intel(R) Processor code named Skylake
Frequency: 4.008 GHz
Logical CPU Count: 8
Example 2: Threading Summary
This example generates a summary report for the Threading analysis result r003tr. The summary portion of the report shows that the multithreaded target spent 64 seconds waiting, with an average concurrency of only 1.073:
vtune -report summary -r r003tr
Summary
-------
Average Concurrency: 1.073
Elapsed Time: 13.911
CPU Time: 11.031
Wait Time: 64.468
Average CPU Usage: 0.768
To identify the cause of the wait, view the result in the GUI performance pane, or generate a performance report.
Hardware Event-based Sampling Collection Summary Report
For Hardware Event-based Sampling Collection results, the summary report includes the following information (if available):
- Collection and Platform information
- Microarchitecture Exploration metrics
- CPU information
- GPU information
- Summary per basic analysis metrics
- Event summary
- Uncore Event summary
For some analysis types, the command-line summary report provides an issue description for metrics that exceed the predefined threshold. If you want to skip issues in the summary report, do one of the following:
Use the -report-knob show-issues=false option when generating the report, for example: vtune -report summary -r r001hpc -report-knob show-issues=false
Use the -format=csv option to view the report in the CSV format, for example: vtune -report summary -r r001hpc -format=csv
Example 3: Hardware Event-Based Sampling Hotspots Summary
This example generates the summary report for the r001hs Hotspots analysis (hardware event-based sampling mode) result on Windows* OS.
vtune -report summary -r r001hs
Elapsed Time: 3.986s
CPU Time: 1.391s
CPI Rate: 0.860
Wait Time: 65.023s
Inactive Time: 14.819s
Total Thread Count: 25
Paused Time: 0s
Hardware Events
Hardware Event Type Hardware Event Count Hardware Event Sample Count Events Per Sample
----------------------------------- -------------------- --------------------------- -----------------
CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE 24,832,593 8 1000030
CPU_CLK_UNHALTED.REF_TSC 3,471,208,416 120 24000000
CPU_CLK_UNHALTED.REF_XCLK 43,877,874 14 1000030
CPU_CLK_UNHALTED.THREAD 3,903,569,890 127 24000000
FP_ARITH_INST_RETIRED.SCALAR_DOUBLE 943,046,424 14 20000030
INST_RETIRED.ANY 4,536,715,682 140 24000000
UOPS_EXECUTED.THREAD 5,282,967,942 72 20000030
UOPS_RETIRED.RETIRE_SLOTS 5,587,595,565 76 20000030
Collection and Platform Info
Application Command Line: C:\samples\tachyon\vc10\analyze_locks_Win32_Release\analyze_locks.exe C:\samples\tachyon\dat\balls.dat
Operating System: Microsoft Windows 10
Computer Name: My Computer
Result Size: 13 MB
Collection start time: 12:12:52 24/07/2018 UTC
Collection stop time: 12:13:03 24/07/2018 UTC
Collector Type: Event-based sampling driver
CPU
Name: Intel(R) Processor code named Skylake ULT
Frequency: 2.496 GHz
Logical CPU Count: 4
Use the Elapsed Time metric as your performance baseline to estimate your optimizations.
Example 4: HPC Performance Characterization Summary
This command generates the summary report for the HPC Performance Characterization analysis result and skips issue descriptions:
vtune -report summary -r r001hpc -report-knob show-issues=false
Elapsed Time: 23.182s
GFLOPS: 14.748
Effective Physical Core Utilization: 58.0%
Effective Logical Core Utilization: 13.920 Out of 24 logical CPUs
Serial Time: 0.069s (0.3%)
Parallel Region Time: 23.113s (99.7%)
Estimated Ideal Time: 14.010s (60.4%)
OpenMP Potential Gain: 9.103s (39.3%)
Memory Bound: 0.446
Cache Bound: 0.175
DRAM Bound: 0.216
NUMA: % of Remote Accesses: 38.3%
FPU Utilization: 2.7%
GFLOPS: 14.748
Scalar GFLOPS: 4.801
Packed GFLOPS: 9.947
Collection and Platform Info
Application Command Line: ./sp.B.x
User Name: vtune
Operating System: 3.10.0-327.el7.x86_64 NAME="Red Hat Enterprise Linux Server" VERSION="7.2 (Maipo)" ID="rhel" ID_LIKE="fedora" VERSION_ID="7.2" P
RETTY_NAME="Red Hat Enterprise Linux Server 7.2 (Maipo)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:redhat:enterprise_linux:7.2:GA:server" HOME_URL="https://w
ww.redhat.com/" BUG_REPORT_URL="https://bugzilla.redhat.com/" REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7" REDHAT_BUGZILLA_PRODUCT_VERSION=7.
2 REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux" REDHAT_SUPPORT_PRODUCT_VERSION="7.2"
Computer Name: nntvtune235
Result Size: 1 GB
Collection start time: 19:04:30 13/06/2017 UTC
Collection stop time: 19:04:53 13/06/2017 UTC
Name: Intel(R) Xeon(R) E5/E7 v2 Processor code named Ivytown
Frequency: 2.694 GHz
Logical CPU Count: 24
CPU
Name: Intel(R) Xeon(R) E5/E7 v2 Processor code named Ivytown
Frequency: 2.694 GHz
Logical CPU Count: 24
Example 5: Memory Access Summary
This command generates the summary report for the Memory Access analysis result collected on Windows and shows issue descriptions:
vtune -report summary -r r001macc
Elapsed Time: 7.917s
CPU Time: 6.473s
Memory Bound: 21.9% of Pipeline Slots
| The metric value is high. This may indicate that a significant fraction
| of execution pipeline slots could be stalled due to demand memory load
| and stores. Explore the metric breakdown by memory hierarchy, memory
| bandwidth information, and correlation by memory objects.
|
L1 Bound: 8.0% of Clockticks
| This metric shows how often machine was stalled without missing the
| L1 data cache. The L1 cache typically has the shortest latency.
| However, in certain cases like loads blocked on older stores, a load
| might suffer a high latency even though it is being satisfied by the
| L1.
|
L2 Bound: 3.0% of Clockticks
L3 Bound: 5.0% of Clockticks
| This metric shows how often CPU was stalled on L3 cache, or contended
| with a sibling Core. Avoiding cache misses (L2 misses/L3 hits)
| improves the latency and increases performance.
|
DRAM Bound: 4.1% of Clockticks
DRAM Bandwidth Bound: 0.4% of Elapsed Time
Memory Latency: 0.000
Loads: 10,137,704,122
Stores: 3,208,896,264
LLC Miss Count: 1,750,105
Average Latency (cycles): 11
Total Thread Count: 21
Paused Time: 0s
System Bandwidth
Max DRAM System Bandwidth: 15 GB
Bandwidth Utilization
Bandwidth Domain Platform Maximum Observed Maximum Average Bandwidth % of Elapsed Time with High BW Utilization(%)
---------------- ---------------- ---------------- ----------------- ---------------------------------------------
DRAM, GB/sec 15 11.300 2.836 0.4%
Collection and Platform Info
Application Command Line: C:\samples\tachyon\vc10\analyze_locks_Win32_Release\analyze_locks.exe "C:\samples\tachyon\dat\balls.dat"
Operating System: Microsoft Windows 10
Computer Name: My Computer
Result Size: 31 MB
Collection start time: 09:33:44 07/06/2017 UTC
Collection stop time: 09:33:52 07/06/2017 UTC
CPU
Name: Intel(R) Processor code named Skylake ULT
Frequency: 2.496 GHz
Logical CPU Count: 4
The Bandwidth Utilization section in the summary report shows the following metrics:
Platform Maximum: Expected maximum bandwidth for the system. This value can be automatically estimated using micro-benchmark at the start of analysis or hard-coded based on theoretical bandwidth limits.
Observed Maximum: Maximum bandwidth observed during the analysis. If the value is close to the Platform Maximum, your workload is probably bandwidth-limited.
Average Bandwidth: Average bandwidth utilization during the analysis.
% of Elapsed Time with High BW Utilization: Percentage of Elapsed time spent heavily utilizing system bandwidth.
This information is provided for all kinds of bandwidth domains you have in the result (DRAM, MCDRAM, QPI, and so on).
See Also
Did you find the information on this page useful?
Feedback Message
Characters remaining: