GPU Application Analysis on Intel® HD Graphics and Intel® Iris® Graphics
- Run the GPU Compute/Media Hotspots analysis for detailed analysis of the GPU-bound application with explicit support of DPC++, Intel® Media SDK, and OpenCL™ software technology:
Analyze GPU Usage for GPU-Bound Applications
- Overview(default) group analyzes general activity of GPU execution units, sampler, general memory, and cache accesses;
- Compute Basic (with global/local memory accesses)group analyzes accesses to different types of GPU memory;
- Compute Extended(for Intel® Core™ M processors and higher)
- Full Computegroup combines metrics from theOverviewandCompute Basicpresets and presents them in the same view, which helps explore the reasons why the GPU execution units were waiting. To use this event set, make sure to enable the multiple runs mode in the target properties.
Factor responsible for Low Peak Occupancy
SLM size requested per workgroup in a computing task is too high
Decrease the SLM size or increase the Local size
Global size (the number of working items to be processed by a computing task) is too low
Increase Global size
Barrier synchronization (the sync primitive can cause low occupancy due to a limited number of hardware barriers on a GPU subslice)
Remove barrier synchronization or increase the Local size
- A tiny computing task could cause considerable overhead when compared to the task execution time.
- There may be high imbalance between the threads executing a computing task.