Developer Guide

Contents

Terminology

In this chapter, OpenMP and SYCL terminology is used interchangeably to describe the partitioning of iterations of an offloaded parallel loop.
As described in the “SYCL Thread Hierarchy and Mapping” chapter, the iterations of a parallel loop (execution range) offloaded onto the GPU are divided into work-groups, sub-groups, and work-items. The ND-range represents the total execution range, which is divided into work-groups of equal size. A work-group is a 1-, 2-, or 3-dimensional set of work-items. Each work-group can be divided into sub-groups. A sub-group represents a short range of consecutive work-items that are processed together as a SIMD vector.
The following table shows how SYCL concepts map to OpenMP and CUDA concepts.
SYCL
OpenMP
CUDA
Work-item
OpenMP thread or SIMD lane
CUDA thread
Work-group
Team
Thread block
Work-group size
Team size
Thread block size
Number of work-groups
Number of teams
Number of thread blocks
Sub-group
SIMD chunk (
simdlen
= 8, 16, 32)
Warp (size = 32)
Maximum number of work-items per work-group
Thread limit
Maximum number of of CUDA threads per thread block

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.