Visible to Intel only — GUID: GUID-91578EA2-5A01-4938-A7B7-33DDCBD7A84C
Visible to Intel only — GUID: GUID-91578EA2-5A01-4938-A7B7-33DDCBD7A84C
Finding Kernels with Register Spills
The compiler outputs a warning if a kernel is compiled ahead of time and has register spills:
$ icpx -fsycl -fsycl-targets=spir64_gen -Xsycl-target-backend "-device pvc"
app.cpp
Compilation from IR - skipping loading of FCL
warning: kernel _ZTSZ4mainEUlT_E0_ compiled SIMD32 allocated 128 regs and
spilled around 396
Build succeeded.
However, the compiler does not report if a kernel has one or more private variables that are not prompted to registers.
The open source tool unitrace reports both kernels with register spills and kernels with not-prompted-to-register private variable(s). Plus, it works for both ahead-of-time compilation and just-in-time compilation:
$ icpx -fsycl app.cpp
$ unitrace -d ./a.out
...
== L0 Backend ==
Kernel, Calls, Time(ns), Time(%), Average(ns), Min(ns), Max(ns)
main::{lambda(auto:1)#4}, 100, 91349596800, 99.896, 913495968, 913304160, 913975360
main::{lambda(auto:1)#3}, 100, 77196960, 0.084, 771969, 2080, 76974560
main::{lambda(auto:1)#5}, 500, 4981120, 0.005, 9962, 1440, 42880
main::{lambda(auto:1)#7}, 500, 4600000, 0.005, 9200, 1440, 38720
main::{lambda(auto:1)#6}, 500, 4590880, 0.005, 9181, 1440, 39040
main::{lambda(auto:1)#1}, 100, 3494400, 0.003, 34944, 1760, 3305120
main::{lambda(auto:1)#2}, 100, 189280, 0.000, 1892, 1440, 31360
=== Kernel Properties ===
Kernel, Private Memory Per Thread, Spill Memory Per Thread
main::{lambda(auto:1)#4}, 16384, 0
main::{lambda(auto:1)#3}, 0, 8192
main::{lambda(auto:1)#5}, 0, 0
main::{lambda(auto:1)#7}, 0, 0
main::{lambda(auto:1)#6}, 0, 0
main::{lambda(auto:1)#1}, 0, 0
main::{lambda(auto:1)#2}, 0, 0
A non-zero value in bytes of the Spill Memory Per Thread indicates the kernel spills registers and a non-zero value in bytes of the Private Memory Per Thread indicates the kernel has at least one private variable that is not prompted to registers.
The tool also reports timing statistics for kernels executed on the device. These statistics can be helpful to developers to evaluate the performance impact of register spills and to prioritize the kernels to be optimized.