Analyze GPU Roofline
Measure and visualize the actual performance of GPU kernels using benchmarks and hardware metric profiling against hardware-imposed performance ceilings, as well as determine the main limiting factor, by running the
GPU Roofline Insights
perspective.
Use the
Roofline
chart to answer the following questions:
- What is the maximum achievable performance with your current hardware resources?
- Does your application work optimally on current hardware resources?
- If not, what are the best candidates for optimization?
- Is memory bandwidth or compute capacity limiting performance for each optimization candidate?
Run the
GPU Roofline Insights
to measure performance of Data Parallel C++ (DPC++), C++/Fortran with OpenMP* pragmas, Intel® oneAPI Level Zero (Level Zero), or OpenCL™ applications enabled to run on a GPU.
How It Works
The
GPU Roofline Insights
perspective includes the following steps:
- Collect OpenCL™ kernels timings and memory data using the Survey analysis with GPU profiling.
- Measure the hardware limitations and collect floating-point and integer operations data using the Characterization analysis with GPU profiling.Intel® Advisorcalculates compute operations (FLOP and INTOP) as a weighted sum of the following groups of instructions: BASIC COMPUTE, FMA, BIT, DIV, POW, MATH.Intel Advisorautomatically determines data type in the collected operations using thedstregister.
GPU Roofline Summary
GPU Roofline Insights
perspective measures performance of kernels executed on GPU and loops/functions executed on CPU and shows what you should optimize your application for. Examine the following performance data:
- See application execution time on GPU and CPU, time spent to transfer data between the CPU and GPU, and how well your application uses the GPU resources.
- Review the Roofline charts for CPU and GPU parts of your application.
- View the execution time details and various performance metrics on GPU- and CPU-executed parts of your application.
- View top five time-consuming loops on GPU and on CPU sorted by self time with performance metrics. You are recommended to start with these loops when checking for performance issues.

See the
Summary
section to examine the performance summary of your application, and continue to
GPU Roofline Insights
Regions tab to examine the performance in more detail.