User Guide


Hotspots Report

Use the
command line report to identify program units (for example: functions, modules, or objects) that take the most processor time (Hotspots analysis), underutilize available CPUs or have long waits (Threading analysis), and so on.
Use the
report to view hottest GPU computing tasks (or their instances) identified with the
or analysis.
The report displays the hottest program units in the descending order by default, starting from the most performance-critical unit. The command-line reports provide the same data that is displayed in the default GUI analysis viewpoint.
To display a list of available groupings for a Hotspots report, enter
-report hotspots -r <
> group-by=?
. If you do not specify a result directory, the latest result is used by default.
Example 1: Hotspots Report with Module Grouping
This example opens the Hotspots report for the
Hotspots analysis result and groups the data by module.
vtune -report hotspots -r r001hs -group-by module
Module CPU Time ----------------- -------- analyze_locks 10.080s KERNELBASE 0.679s ntdl 0.164s ...
Example 2: Hotspots Report with Limited Items
This example displays the Hotspots report for the r001hs analysis result including only the top two functions with the highest CPU Time values. Functions having insignificant impact on performance are excluded from output.
vtune -report hotspots -r r001hs -limit 2
Function CPU Time ---------------- -------- grid_intersect 5.489s sphere_intersect 3.590s
Example 3: Report per OpenCL Kernels
This example shows how to view the collected data per OpenCL kernels submitted and executed on the GPU:
vtune -report hotspots -group-by=computing-task -r r000gh
Computing Task Work Size:Global Computing Task:Total Time Data Transferred:Size EU Array:Active(%) L3 <-> GTI Total Bandwidth, GB/sec ------------------- ---------------- ------------------------- --------------------- ------------------ ---------------------------------- AdvancePaths 65536 13.170s 25.0% 22.928 Init 65536 0.006s 34.4% 45.802 Intersect 65536 49.139s 61.5% 23.149 Sampler 65536 6.525s 76.4% 11.745 InitFrameBuffer 362432 0.000s 4.7% 17.456 clEnqueueReadBuffer 1.045s 3 GB 1.5% 8.840
Example 4: Report Grouped per SYCL Task Instances
This example filters and groups the collected data by SYCL task instances:
vtune -report hotspots -group-by=computing-instance -r r000gh
Computing Task Instance Work Size:Global Computing Task:Total Time Data Transferred:Size GPU Time ------------------- ------------------ ---------------- ------------------------- --------------------- -------- CopyVector2 2 6553600 0.190s 0.190s clEnqueueReadBuffer 1 0.034s 400 MB 0.034s

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at