Examine GPU Roofline Summary
Explore Program Metrics for Code Regions Executed on GPU
- How much time your application spends on CPU and on GPU in relation to the total time of the application
- How much time your application spends on transferring data between CPU and GPU
- How well your application utilizes the floating-point units (FPUs) for parallel execution of operations
- How many threads in each execution unit your application occupies to execute compute operations
- How your application utilizes FPU pipelines and how many instructions it executes per cycle
Identify Dominating Data Types and Hotspots
- Explore the operations data and identify the dominating data type in theOP/S and Bandwidthpane. This data can be useful, for example, if the compiler generates integer operations (INTOP) or floating-point operations (FLOP) that are not obvious.
- View the list of top hotspots on the GPU in theTop Hotspotspane and examine their performance in relation to compute performance and memory bandwidth using the Roofline chart in theOP/S and Bandwidthpane. These hotspots are the best candidates for optimization as they have the greatest impact on the application total time. To view detailed information about the performance of each kernel and visualize it against hardware limitations, double-click a hotspot in the pane or a dot on a roofline chart.