Intel® Advisor User Guide

ID 766448
Date 6/24/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Parallelize Functions - Intel® oneAPI Threading Building Blocks (oneTBB) Tasks

The following sections describe various alternatives, depending on how the tasks fit within the surrounding parallel site.

Two or More Parallel Statements

When the outermost statements in the annotation site have been placed into tasks, as shown in this serial example, it is easy to execute them in parallel.

    ANNOTATE_SITE_BEGIN(sitename);
        ANNOTATE_TASK_BEGIN(task1);
            statement_1
        ANNOTATE_TASK_END();
        ANNOTATE_TASK_BEGIN(task2);
            statement_2
        ANNOTATE_TASK_END();
    ANNOTATE_SITE_END();

Two or More Parallel Statements - Intel® oneAPI Threading Building Blocks (oneTBB)

The easiest way to cause several sequential statements to be executed as independent tasks is to change your program as follows using parallel_invoke.

Both of the following examples use the C++11 lambda expression feature - you need to use the Intel® oneAPI DPC++/C++ Compiler and enable the C++11 support to compile it.

  #include <tbb/tbb.h>

  ...
  tbb::parallel_invoke(
     [&]{statement_1;},
     [&]{statement_2;}
}

A variable used inside a lambda expression but declared outside it is said to be captured. The [&] in the example specifies capture by reference. It is also possible to capture by value [=], or even capture different variables different ways. See the compiler documentation on lambda expressions for details.

Using C++ structs Instead of Lambda Expressions

Any code that can be written with a lambda expression can be written without one - it is just more work. All a lambda expression does is:

  1. Define a class with operator() defined to execute the body of the lambda expression.

  2. Define a class constructor that captures variables into fields of the class.

  3. Construct an instance of that class.

The constructor can capture any of the surrounding locals that are needed and save them in data members.

{ struct S1 { void operator()() { statement_1 }};
  struct S2 { void operator()() { statement_2 }};
  tbb::parallel_invoke(S1(),S2());
}