Developer Guide

  • 2021.7.1
  • 09/08/2022
  • Public Content

Execution Policies

The implementation supports the device execution policies used to run the massive parallel computational model for heterogeneous systems. The policies are specified in the Intel® oneAPI DPC++ Library (oneDPL) section of the oneAPI Specification.
For any of the implemented algorithms, pass one of the execution policy objects as the first argument in a call to specify the desired execution behavior. The policies have the following meaning:
Execution Policy Value
Description
seq
Sequential execution.
unseq
Unsequenced SIMD execution. This policy requires that all functions provided are SIMD-safe.
par
Parallel execution by multiple threads.
par_unseq
Combined effect of
unseq
and
par
.
dpcpp_default
Massive parallel execution on devices using DPC++.
dpcpp_fpga
Massive parallel execution on FPGA devices.
The implementation is based on Parallel STL from the LLVM Project.
oneDPL supports two parallel backends for execution with
par
and
par_unseq
policies:
  1. TBB backend (enabled by default) uses Intel® oneAPI Threading Building Blocks (oneTBB) or Intel® Threading Building Blocks (Intel® TBB) for parallel execution.
  2. OpenMP backend uses OpenMP* pragmas for parallel execution. Visit Macros for the information how to enable the OpenMP backend.
Follow these steps to add Parallel API to your application:
  1. Add
    #include <oneapi/dpl/execution>
    to your code. Then include one or more of the following header files, depending on the algorithms you intend to use:
    1. #include <oneapi/dpl/algorithm>
    2. #include <oneapi/dpl/numeric>
    3. #include <oneapi/dpl/memory>
    For better coexistence with the C++ standard library, include oneDPL header files before the standard C++ ones.
  2. Pass a oneDPL execution policy object, defined in the
    oneapi::dpl::execution
    namespace, to a parallel algorithm.
  3. Use the C++ standard execution policies:
    1. Compile the code with options that enable OpenMP parallelism and/or vectorization pragmas.
    2. Link with the Intel® oneAPI Threading Building Blocks (oneTBB) or Intel® Threading Building Blocks (Intel® TBB) dynamic library for TBB-based parallelism.
  4. Use the device execution policies:
    1. Compile the code with options that enable support for SYCL 2020.

Use the C++ Standard Execution Policies

Example:
#include <oneapi/dpl/execution> #include <oneapi/dpl/algorithm> #include <vector> int main() { std::vector<int> data( 1000 ); std::fill(oneapi::dpl::execution::par_unseq, data.begin(), data.end(), 42); return 0; }

Use the Device Execution Policies

The device execution policy specifies where a parallel algorithm runs. It encapsulates a SYCL device or queue and allows you to set an optional kernel name. Device execution policies can be used with all standard C++ algorithms that support execution policies.
To create a policy object, you may use one of the following constructor arguments:
  • A SYCL queue
  • A SYCL device
  • A SYCL device selector
  • An existing policy object with a different kernel name
A kernel name is set with a policy template argument. Providing a kernel name for a policy is optional, if your compiler supports implicit names for SYCL kernel functions. The Intel® oneAPI DPC++/C++ Compiler supports it by default; for other compilers it may need to be enabled with compilation options such as
-fsycl-unnamed-lambda
. Refer to your compiler documentation for more information.
The
oneapi::dpl::execution::dpcpp_default
object is a predefined object of the
device_policy
class. It is created with a default kernel name and a default queue. Use it to construct customized policy objects or pass directly when invoking an algorithm.
If
dpcpp_default
is passed directly to more than one algorithm, you must ensure that the compiler you use supports implicit kernel names (see above) and this option is turned on.
The
make_device_policy
function templates simplify
device_policy
creation.

Usage Examples

The code examples below assume you are
using namespace oneapi::dpl::execution;
and
using namespace sycl;
directives when referring to policy classes and functions:
auto policy_a = device_policy<class PolicyA> {}; std::for_each(policy_a, ...);
auto policy_b = device_policy<class PolicyB> {device{gpu_selector{}}}; std::for_each(policy_b, ...);
auto policy_c = device_policy<class PolicyС> {cpu_selector{}}; std::for_each(policy_c, ...);
auto policy_d = make_device_policy<class PolicyD>(dpcpp_default); std::for_each(policy_d, ...);
auto policy_e = make_device_policy(queue{property::queue::in_order()}); std::for_each(policy_e, ...);

Use the FPGA Policy

The
fpga_policy
class is a device policy tailored to achieve better performance of parallel algorithms on FPGA hardware devices.
Use the policy when you run the application on a FPGA hardware device or FPGA emulation device with the following steps:
  1. Define the
    ONEDPL_FPGA_DEVICE
    macro to run on FPGA devices and the
    ONEDPL_FPGA_EMULATOR
    to run on FPGA emulation devices.
  2. Add
    #include <oneapi/dpl/execution>
    to your code.
  3. Create a policy object by providing an unroll factor (see the
    Note
    below), a class type for a unique kernel name as template arguments (both optional), and one of the following constructor arguments:
    1. A SYCL queue constructed for the FPGA Selector (the behavior is undefined with any other queue).
    2. An existing FPGA policy object with a different kernel name and/or unroll factor.
  4. Pass the created policy object to a parallel algorithm.
The default constructor of
fpga_policy
wraps a SYCL queue created for
fpga_selector
, or for
fpga_emulator_selector
if the
ONEDPL_FPGA_EMULATOR
is defined.
oneapi::dpl::execution::dpcpp_fpga
is a predefined object of the
fpga_policy
class created with a default unroll factor and a default kernel name. Use it to create customized policy objects or pass directly when invoking an algorithm.
Specifying the unroll factor for a policy enables loop unrolling in the implementation of your algorithms. The default value is 1. To find out how to choose a more precise value, refer to the unroll Pragma and Loop Analysis chapters of the Intel® oneAPI DPC++ FPGA Optimization Guide.
The
make_fpga_policy
function templates simplify
fpga_policy
creation.

FPGA Policy Usage Examples

The code below assumes you have added
using namespace oneapi::dpl::execution;
for policies and
using namespace sycl;
for queues and device selectors:
constexpr auto unroll_factor = 8; auto fpga_policy_a = fpga_policy<unroll_factor, class FPGAPolicyA>{}; auto fpga_policy_b = make_fpga_policy(queue{intel::fpga_selector{}}); auto fpga_policy_c = make_fpga_policy<unroll_factor, class FPGAPolicyC>();

Error Handling with Device Execution Policies

The SYCL error handling model supports two types of errors: Synchronous errors cause the SYCL host runtime libraries throw exceptions. Asynchronous errors may only be processed in a user-supplied error handler associated with a SYCL queue.
For algorithms executed with device policies, handling all errors, synchronous or asynchronous, is a responsibility of the caller. Specifically:
  • No exceptions are thrown explicitly by algorithms.
  • Exceptions thrown by runtime libraries at the host CPU, including SYCL synchronous exceptions, are passed through to the caller.
  • SYCL asynchronous errors are not handled.
To process SYCL asynchronous errors, the queue associated with a device policy must be created with an error handler object. The predefined policy objects (
dpcpp_default
, etc.) have no error handlers; do not use them if you need to process asynchronous errors.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.