Developer Guide

FPGA Optimization Guide for Intel® oneAPI Toolkits

ID 767853
Date 7/13/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Asynchronous Parallelism Within Kernels (task_sequence)

Your kernel design might contain operations that you want to run asynchronously from the main flow of your kernel. The Intel oneAPI DPC++/C++ Compiler allows you to define these asynchronous activities in task functions and to asynchronously launch parallel invocations of these task functions through object instances of the task_sequence class. To enable the task_sequence class, include the following task_sequence header file in your source code:

#include <sycl/ext/intel/experimental/task_sequence.hpp>

task_sequence is a templated class that resides in the sycl::ext::intel::experimental namespace. The template parameters include a reference to the task function to be associated with the class and optional parameters specifying the depth of the queues for launching the tasks and holding their results. Instantiated objects of a parameterized instance of task_sequence represent the FPGA hardware implementing the associated task function and task queues. You can control the amount of replication or hardware reuse by the number of objects you declare.

The task_sequence class objects are helpful in situations where you want to express coarse-grained thread-level parallelism. For example:

  • Improving the performance of operations like executing loops in parallel.
  • Reducing FPGA area utilization by sharing an expensive compute block with different parts of your kernel.
task_sequence Template Parameters
Template Parameter Description
auto &f typename ReturnT, typename... ArgsT, ReturnT (&f)(ArgsT...) Callable f that defines the asynchronous task to be associated with the task_sequence. f must be statically resolvable at compile time, which means it is not a function pointer, and the return type (ReturnT) and argument types (ArgsT…) of f must be resolvable and fixed.
uint32_t invocation_capacity The size of the hardware queue instantiated for async() function calls. This parameter value corresponds to the minimum number of outstanding async() function calls to be supported. When the outstanding number of async() function calls reach this value, further calls may block until the number of outstanding calls is reduced to the invocation_capacity. The default value of this parameter is 1.
uint32_t response_capacity The size of the hardware queue instantiated to hold task function results. This parameter value corresponds to the maximum number of outstanding async() calls such that all outstanding tasks are guaranteed to make forward progress. Further async() calls may block until the number of outstanding calls reduce to the response_capacity. The default value of this parameter is 1.
task_sequence Function APIs
Function API Description
void async(ArgsT... Args) Asynchronously calls f with arguments Args. It increments the number of outstanding tasks by 1.
ReturnT get() Synchronously retrieves the result of an asynchronous call. Results are retrieved in FIFO order of their async() invocations. It decrements the number of outstanding tasks by 1.
~task_sequence() Destructor for the task_sequence class. It implicitly invokes the get() function on all outstanding invocations launched through the async() function call.