Work-Group Size Recommendations Summary
If your kernel uses local memory and/or barriers, the actual number of work-groups that can run simultaneously on one of the Intel® Graphics sub-slice is limited by the following key factors:
- There are 16 barrier registers per sub-slice, so no more than 16 work-groups can be executed simultaneously.
- The amount of shared local memory available per sub-slice (64KB). If for example a work-group requires 32KB of shared local memory, only 2 of those work-groups can run concurrently, regardless of work-group size.
Therefore, to keep the device utilization high with the limited number of workgroups, larger workgroup sizes are required. Use power-of-two workgroup sizes between 64 and 256.
The number of sub-slices depends on the hardware generation and specific product. Refer to the See Also section for the details of the architecture.
If your kernel is not using local memory or barriers, these restrictions do not apply, and work-group size of 32 work-items is optimal for the most cases.
More on the Gen7.5 and Gen8 Compute Architectures: https://software.intel.com/en-us/articles/intel-graphics-developers-guides
Did you find the information on this page useful?