Specify a Work-Group Size
Specify a maximum or the required work-group size whenever possible. The
Intel® oneAPI
relies on this specification to optimize hardware use of the SYCL* kernel without involving excess logic.
DPC++/C++
Compiler- If you do not specify the[[intel::max_work_group_size(Z, Y, X)]]or[[sycl::reqd_work_group_size(Z, Y, X)]]attribute in your kernel, the work-group size assumes a default value depending on compilation time and runtime constraints.
- If your kernel contains a barrier, theIntel® oneAPIsets a default maximum scalarized work-group size of 128 work-items.DPC++/C++Compiler
- If your kernel does not query any SYCL intrinsic that allow different threads to behave differently (that is, local or global thread IDs, or work-group ID), theIntel® oneAPIinfers a single-threaded execution mode and sets the maximum work-group size toDPC++/C++Compiler(1, 1, 1). In this case, the SYCL runtime also enforces a global enqueue size of(1, 1, 1), and loop pipelining optimizations are enabled within theIntel® oneAPI.DPC++/C++Compiler
The
[[cl::reqd_work_group_size(Z, Y, X)]]
attribute is deprecated. Use the
[[sycl::reqd_work_group_size(Z, Y, X)]]
attribute.
To specify the work-group size, modify your kernel code in the following manner:
- To specify the maximum number of work-items that the compiler provisions for a work-group in a kernel, insert the[[intel::max_work_group_size(Z, Y, X)]]attribute in your kernel source code.For example:constexpr unsigned MAX_WG_SIZE = 4; ... cgh.parallel_for<class kernelCompute>( nd_range<1>(range<1>(N), range<1>(wg_size)), [=] (nd_item<id> it) [[intel::max_work_group_size(1, 1, MAX_WG_SIZE)]] { auto gid = it.get_global_id(0); accessorRes[gid] = accessorIdx[gid] * 2; });
- To specify the required number of work-items that theIntel® oneAPIprovisions for a work-group in a kernel, insert theDPC++/C++Compiler[[sycl::reqd_work_group_size(Z, Y, X)]]attribute in your kernel source code.For example:constexpr unsigned REQD_WG_SIZE = 4; ... cgh.parallel_for<class kernelCompute>( nd_range<1>(range<1>(N), range<1>(wg_size)), [=] (nd_item<id> it) [[sycl::reqd_work_group_size(1, 1, REQD_WG_SIZE)]] { auto gid = it.get_global_id(0); accessorRes[gid] = accessorIdx[gid] * 2; });