Developer Guide

  • 2021.7.1
  • 09/08/2022
  • Public Content

Asynchronous API Algorithms

The functions defined in the STL
headers are traditionally blocking. Intel® oneAPI DPC++ Library (oneDPL) extends the functionality of the C++17 parallel algorithms by providing asynchronous algorithms with non-blocking behavior. This experimental feature enables you to express a concurrent control flow by building dependency chains, interleaving algorithm calls, and interoperability with SYCL* kernels.
The current implementation for async algorithms is limited to device execution policies. All the functionality described below is available in the
The following async algorithms are currently supported:
  • copy_async
  • fill_async
  • for_each_async
  • reduce_async
  • sort_async
  • inclusive_scan_async
  • exclusive_scan_async
  • transform_async
  • transform_reduce_async
  • transform_inclusive_scan_async
  • transform_exclusive_scan_async
All the interfaces listed above are a subset of the C++17 STL algorithms, where the suffix
is added to the corresponding name (for example:
, etc.). The behavior and signatures are overlapping with the C++17 STL algorithm with the following changes:
  • They do not block the execution.
  • They take an arbitrary number of events (including 0) as last arguments to allow you to express input dependencies.
  • They return a future-like object that allows you to use
    for completion and
    for the result.
The type of the future-like object returned from an asynchronous algorithm is unspecified. The following member functions are present:
  • get()
    returns the result.
  • wait()
    waits for the result to become available.
If the returned object is the result of an algorithm with a device policy, it can be converted into a
. The lifetime of any resources the algorithm allocates (for example: temporary storage) is bound to the lifetime of the returned object.
The following utility functions are available:
  • wait_for_all(…)
    waits for an arbitrary number of objects that are convertible into
    to become ready.

Example of Async API Usage

#include <oneapi/dpl/execution> #include <oneapi/dpl/async> #include <CL/sycl.hpp> int main() { using namespace oneapi; { /* Build and compute a simple dependency chain: Fill buffer -> Transform -> Reduce */ sycl::buffer<int> a{10}; auto fut1 = dpl::experimental::fill_async(dpl::execution::dpcpp_default, dpl::begin(a),dpl::end(a),7); auto fut2 = dpl::experimental::transform_async(dpl::execution::dpcpp_default, dpl::begin(a),dpl::end(a),dpl::begin(a), [&](const int& x){return x + 1; },fut1); auto ret_val = dpl::experimental::reduce_async(dpl::execution::dpcpp_default, dpl::begin(a),dpl::end(a),fut1,fut2).get(); } return 0; }

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at