Terminology
In this chapter, OpenMP* and DPC++ terminology is used interchangeably
to describe the partitioning of iterations of an offloaded parallel
loop.
As described in the “DPC++ Thread Hierarchy and Mapping” chapter, the
iterations of a parallel loop (execution range) offloaded onto the GPU
are divided into work-groups, sub-groups, and work-items. The ND-range
represents the total execution range, which is divided into
work-groups of equal size. A work-group is a 1-, 2-, or 3-dimensional
set of work-items. Each work-group can be divided into sub-groups. A
sub-group represents a short range of consecutive work-items that are
processed together as a SIMD vector.
The following table shows how DPC++ concepts map to OpenMP and CUDA
concepts.
DPC++ | OpenMP | CUDA |
---|---|---|
Work-item | OpenMP thread
or SIMD lane | CUDA thread |
Work-group | Team | Thread block |
Work-group size | Team size | Thread block size |
Number of work-groups | Number of teams | Number of thread blocks |
Sub-group | SIMD chunk
( simdlen = 8, 16, 32) | Warp (size = 32) |
Maximum number of
work-items per
work-group | Thread limit | Maximum number of
of CUDA threads per
thread block |