Developer Guide

Contents

Terminology

In this chapter, OpenMP* and DPC++ terminology is used interchangeably to describe the partitioning of iterations of an offloaded parallel loop.
As described in the “DPC++ Thread Hierarchy and Mapping” chapter, the iterations of a parallel loop (execution range) offloaded onto the GPU are divided into work-groups, sub-groups, and work-items. The ND-range represents the total execution range, which is divided into work-groups of equal size. A work-group is a 1-, 2-, or 3-dimensional set of work-items. Each work-group can be divided into sub-groups. A sub-group represents a short range of consecutive work-items that are processed together as a SIMD vector.
The following table shows how DPC++ concepts map to OpenMP and CUDA concepts.
DPC++
OpenMP
CUDA
Work-item
OpenMP thread or SIMD lane
CUDA thread
Work-group
Team
Thread block
Work-group size
Team size
Thread block size
Number of work-groups
Number of teams
Number of thread blocks
Sub-group
SIMD chunk (
simdlen
= 8, 16, 32)
Warp (size = 32)
Maximum number of work-items per work-group
Thread limit
Maximum number of of CUDA threads per thread block

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.