Visible to Intel only — GUID: GUID-C0938D55-B44F-47DA-A721-3F70DD0F626B
Visible to Intel only — GUID: GUID-C0938D55-B44F-47DA-A721-3F70DD0F626B
Invalid Kernel Argument Size
Occurs when the size of kernel argument exceeds the amount of registers available for argument storage.
ID |
Code Location |
Description |
---|---|---|
1 |
Allocation site |
Represents source location of passing arguments from host to a kernel. |
By default, kernel arguments are promoted to registers if possible. On each device, the amount of registers available for kernel arguments may vary (usually 25-50%). The arguments outside this limit are cut off, which results in logical errors in the kernel.
In OpenCL™, arguments are specified directly using the clSetKernelArg function. In this case, the problem is more obvious and may appear if a large amount of arguments or wide structures is used.
In Data Parallel (DPC++), arguments may be specified implicitly using lambda capture list. If the list is specified as [=] or as [&], all used arguments are captured. In this case, a large amount of arguments, wide structures or classes may be passed to a kernel accidentally.
In this diagnostic, Intel® Inspector displays the total size kernel arguments (in bytes) vs the device limit.
DPC++ Example
const int N = 1000; struct Data { double numbers[N]; }; Data data; queue.submit([&](cl::sycl::handler &cgh) { cgh.parallel_for<class my_task>(cl::sycl::range<1> { N }, [=](cl::sycl::id<1> idx) { deviceData[0] += data.numbers[idx]; // Implicit usage of Data structure from host }); }); queue.wait(); // The structure contains 1000 doubles x 8 bytes = 8000 bytes of data > available kernel arguments limit.
Possible Correction Strategies
To avoid the problem, use the following hints:
- Reduce the number of arguments used if possible.
- Avoid passing complex structures to a kernel. Consider passing separate fields instead of a complete data structure.
- Narrow the type of data. Consider using float instead of double, int instead of long, etc.
- In a DPC++ program, specify lambda capture list explicitly instead of using a default capture list specified by [=] or [&].