Partitioner Summary

Intel® oneAPI Threading Building Blocks Developer Guide and API Reference

Download PDF

ID 772616

Date 4/11/2022

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-18591B93-89AB-416E-9B52-29797ED58465

View Details

Partitioner Summary

The parallel loop templates parallel_for and parallel_reduce take an optional partitioner argument, which specifies a strategy for executing the loop. The following table summarizes partitioners and their effect when used in conjunction with blocked_range.

Partitioner	Description	When Used with `blocked_range(i,j,g)`
`simple_partitioner`	Chunksize bounded by grain size.	`g/2 ≤ chunksize ≤ g`
`auto_partitioner` (default)	Automatic chunk size.	`g/2 ≤ chunksize`
`affinity_partitioner`	Automatic chunk size, cache affinity and uniform distribution of iterations.	`g/2 ≤ chunksize`
`static_partitioner`	Deterministic chunk size, cache affinity and uniform distribution of iterations without load balancing.	`max(g/3, problem_size/num_of_resources) ≤ chunksize`

An auto_partitioner is used when no partitioner is specified. In general, the auto_partitioner or affinity_partitioner should be used, because these tailor the number of chunks based on available execution resources. affinity_partitioner and static_partitioner may take advantage of Range ability to split in a given ratio (see “Advanced Topic: Other Kinds of Iteration Spaces”) for distributing iterations in nearly equal chunks between computing resources.

simple_partitioner can be useful in the following situations:

The subrange size for operator() must not exceed a limit. That might be advantageous, for example, if your operator() needs a temporary array proportional to the size of the range. With a limited subrange size, you can use an automatic variable for the array instead of having to use dynamic memory allocation.
A large subrange might use cache inefficiently. For example, suppose the processing of a subrange involves repeated sweeps over the same memory locations. Keeping the subrange below a limit might enable the repeatedly referenced memory locations to fit in cache.
You want to tune to a specific machine.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® oneAPI Threading Building Blocks Developer Guide and API Reference

Partitioner Summary