Developer Guide

FPGA Optimization Guide for Intel® oneAPI Toolkits

ID 767853
Date 3/31/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Task Functions

Task functions for the task_sequence class are defined as a C++ function. A reference to this C++ function is used as the first template parameter to a specific parameterization of the task_sequence class. More specifically, the template parameter is an auto reference to a callable f that defines the asynchronous task to be associated with the task_sequence class. The requirement for an auto reference amounts to a requirement that f be statically resolvable at compile time. For example, f cannot be a function pointer. Furthermore, the return type and argument types of f must be resolvable and fixed for each parameterization of the task_sequence class.

Task Invocation Interface

You can invoke the task function f with the async() function, which accepts the same arguments as f (both in terms of argument type and order), and stores the return value in a FIFO queue upon completion of f. The async() function call is non-blocking, and it returns before the asynchronous f invocation completes executing and potentially before f even begins executing, as the return type from the async() function provides no implicit information on the execution status of f.

The get() function retrieves the oldest result from this logical FIFO queue and blocks (waits) until a result is available if no result is available immediately upon the call to the get() function. The return type of the get() function is the same as the return type of the task function f.

You can invoke both functions only on the device on which you have declared a task_sequence object. Calling async() or get() function on a different device results in undefined behavior.

In the optimization report (report.html), async() and get() function calls appear as blocking pipe write and pipe read operations.

The following example uses the task function mult to parameterize the task_sequence class with two object declarations, task1 and task2. These objects asynchronously call mult() function twice using the async() function on each object, with their results collected by calls to the get() function on each object.

int mult(int a, int b) {
	 return a * b;
}
int mult_and_add(int a1, int b1, int a2, int b2) {
  task_sequence<mult> task1, task2;
  task1.async(a1, b1);
  task2.async(a2, b2);
  return task1.get() + task2.get();
}
NOTE:

The arguments to the async() function match the arguments to the mult() function (that is (int, int)). The return type to the get() function matches the return type to the mult() function (that is (int)).

Scope, Lifetime, and Reuse of task_sequence Objects

task_sequence objects must follow these guidelines:

  • Object declarations of a parameterized task_sequence class must be local, which means global declarations and dynamic allocations are not allowed.
  • task_sequence objects must not have their lifetime extended beyond the scope in which they are declared. It is undefined behavior if their lifetime is extended. Both move and copy constructors for the task_sequence class are therefore deleted.
  • Each task_sequence class object represents a specific instantiation of FPGA hardware to perform the task function operation.
  • Launching tasks via the async() function calls on the same object results in the reuse of that object's hardware. Thus, you can control the reuse or replication of FPGA hardware by the number of task_sequence objects you declare. Since object lifetime is confined to the scope in which the task_sequence object is created, carefully declare your object in the scope in which you intend to perform its reuse.
  • A task function associated with a task_sequence can contain task_sequence declarations. In such cases, each object instantiation of the base task_sequence class results in new object declaration of the contained objects. In the following example, baseTask and childTask are task functions that parameterize the task_sequence class. In the kernel code, two task_sequence<baseTask> objects are declared, which means that two hardware instances implementing baseTask are instantiated. Since baseTask includes two declarations of task_sequence<childTask>, each task_sequence<baseTask> object instantiates two task_sequence<childTask> objects. and therefore two hardware instantiations of childTask. Since there are two task_sequence<baseTask> objects in this code, there are a total of four task_sequence<childTask> objects and four hardware instantiations of childTask.
    void childTask() {
      // useful computation here
      …
    }
    void baseTask() {
      task_sequence<childTask> child1, child2;
      // useful computation here, including async() and get() on child1, child2
      …
    }
    // in kernel code
    {
      task_sequence<baseTask> base1, base2;
      …
    }
  • Before exiting scope, task_sequence objects must retire all outstanding async() function invocations. This is guaranteed by the task_sequence destructor, which calls the get() function several times matching the count of outstanding async() functions calls.

Adding Capacity When Launching Task Functions

Consider specifying the invocation_capacity parameter of the task_sequence class if you observe stall patterns in your simulation waveforms that indicate an imbalance between the following:

  • Any backpressure introduced by the task function.
  • How often you invoke the task function using the async() function.

Adding Capacity When Collecting Task Functions

Consider specifying the response_capacity parameter of the task_sequence class if you observe stall patterns in your design waveforms that indicate a difference in the following:

  • The cadence of data production in the task function.
  • The cadence of collecting task results via the get() function.