• 2021.4
  • 09/27/2021
  • Public Content
Contents

parallel_for

Suppose you want to apply a function
Foo
to each element of an array, and it is safe to process each element concurrently. Here is the sequential code to do this:
void SerialApplyFoo( float a[], size_t n ) { for( size_t i=0; i!=n; ++i ) Foo(a[i]); }
The iteration space here is of type
size_t
, and goes from
0
to
n-1
. The template function
oneapi::tbb::parallel_for
breaks this iteration space into chunks, and runs each chunk on a separate thread. The first step in parallelizing this loop is to convert the loop body into a form that operates on a chunk. The form is an STL-style function object, called the
body
object, in which
operator()
processes a chunk. The following code declares the body object.
#include "oneapi/tbb.h" using namespace oneapi::tbb; class ApplyFoo { float *const my_a; public: void operator()( const blocked_range<size_t>& r ) const { float *a = my_a; for( size_t i=r.begin(); i!=r.end(); ++i ) Foo(a[i]); } ApplyFoo( float a[] ) : my_a(a) {} };
The
using
directive in the example enables you to use the library identifiers without having to write out the namespace prefix
oneapi::tbb
before each identifier. The rest of the examples assume that such a
using
directive is present.
Note the argument to
operator()
. A
blocked_range<T>
is a template class provided by the library. It describes a one-dimensional iteration space over type
T
. Class
parallel_for
works with other kinds of iteration spaces too. The library provides
blocked_range2d
for two-dimensional spaces. You can define your own spaces as explained in Advanced Topic: Other Kinds of Iteration Spaces.
An instance of
ApplyFoo
needs member fields that remember all the local variables that were defined outside the original loop but used inside it. Usually, the constructor for the body object will initialize these fields, though
parallel_for
does not care how the body object is created. Template function
parallel_for
requires that the body object have a copy constructor, which is invoked to create a separate copy (or copies) for each worker thread. It also invokes the destructor to destroy these copies. In most cases, the implicitly generated copy constructor and destructor work correctly. If they do not, it is almost always the case (as usual in C++) that you must define
both
to be consistent.
Because the body object might be copied, its
operator()
should not modify the body. Otherwise the modification might or might not become visible to the thread that invoked
parallel_for
, depending upon whether
operator()
is acting on the original or a copy. As a reminder of this nuance,
parallel_for
requires that the body object’s
operator()
be declared
const
.
The example
operator()
loads
my_a
into a local variable
a
. Though not necessary, there are two reasons for doing this in the example:
  • Style
    . It makes the loop body look more like the original.
  • Performance
    . Sometimes putting frequently accessed values into local variables helps the compiler optimize the loop better, because local variables are often easier for the compiler to track.
Once you have the loop body written as a body object, invoke the template function
parallel_for
, as follows:
#include "oneapi/tbb.h" void ParallelApplyFoo( float a[], size_t n ) { parallel_for(blocked_range<size_t>(0,n), ApplyFoo(a)); }
The
blocked_range
constructed here represents the entire iteration space from 0 to n-1, which
parallel_for
divides into subspaces for each processor. The general form of the constructor is
blocked_range<T>(begin,end,grainsize)
. The
T
specifies the value type. The arguments
begin
and
end
specify the iteration space STL-style as a half-open interval [
begin
,
end
). The argument
grainsize
is explained in the Controlling Chunking section. The example uses the default grainsize of 1 because by default
parallel_for
applies a heuristic that works well with the default grainsize.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.