Developer Guide
FPGA Optimization Guide for Intel® oneAPI Toolkits
A newer version of this document is available. Customers should click here to go to the newest version.
NDRange Kernels
If your program naturally tends to describe multiple concurrent threads operating in a data-parallel manner, specify your kernel to operate in parallel instances over a work-item index-space (NDRange).
Avoid Work-Item ID-Dependent Backward Branching
The Intel® oneAPI DPC++/C++ Compiler collapses conditional statements into single bits that indicate when a particular functional unit becomes active. The Intel® oneAPI DPC++/C++ Compiler eliminates simple control flow paths that do not involve looping structures, resulting in a flat control structure and more efficient hardware use.
Avoid including any work-item ID-dependent backward branching (that is, branching that occurs in a loop) in your kernel because it degrades performance.
For example, the following code fragment illustrates branching that involves work-item ID such as get_global_id or get_local_id:
for (size_t i = 0; i < get_global_id(0); i++)
{
// statements
}