NDRange Kernels
If your program naturally tends to describe multiple concurrent threads operating in a data-parallel manner, specify your kernel to operate in parallel instances over a work-item index-space (NDRange).
Avoid Work-Item ID-Dependent Backward Branching
The
Intel® oneAPI
collapses conditional statements into single bits that indicate when a particular functional unit becomes active. The
DPC++/C++
CompilerIntel® oneAPI
eliminates simple control flow paths that do not involve looping structures, resulting in a flat control structure and more efficient hardware use.
DPC++/C++
CompilerAvoid including any work-item ID-dependent backward branching (that is, branching that occurs in a loop) in your kernel because it degrades performance.
For example, the following code fragment illustrates branching that involves work-item ID such as
get_global_id
or get_local_id
:
for (size_t i = 0; i < get_global_id(0); i++)
{
// statements
}