Throughput of pipeline

Intel® oneAPI Threading Building Blocks Developer Guide and API Reference

Download PDF

ID 772616

Date 4/11/2022

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Document Table of Contents x

Intel® oneAPI Threading Building Blocks (oneTBB)

Intel® oneAPI Threading Building Blocks (oneTBB) x

Getting Help and Support Notational Conventions Introduction oneTBB Benefits oneTBB Developer Guide oneTBB API Reference Notices and Disclaimers

oneTBB Developer Guide x

Package Contents Parallelizing Simple Loops Parallelizing Complex Loops Parallelizing Data Flow and Dependence Graphs Work Isolation Exceptions and Cancellation Containers Mutual Exclusion Timing Memory Allocation The Task Scheduler Design Patterns Migrating from Threading Building Blocks (TBB) Constrained APIs Appendix A Costs of Time Slicing Appendix B Mixing With Other Threading Packages References

Package Contents x

Debug Versus Release Libraries Scalable Memory Allocator Windows* Linux* macOS*

Parallelizing Simple Loops x

Initializing and Terminating the Library parallel_for parallel_reduce Advanced Example Advanced Topic: Other Kinds of Iteration Spaces

parallel_for x

Lambda Expressions Automatic Chunking Controlling Chunking Bandwidth and Cache Affinity Partitioner Summary

Parallelizing Complex Loops x

Cook Until Done: parallel_for_each Working on the Assembly Line: parallel_pipeline Summary of Loops and Pipelines

Working on the Assembly Line: parallel_pipeline x

Using Circular Buffers Throughput of pipeline Non-Linear Pipelines

Parallelizing Data Flow and Dependence Graphs x

Parallelizing Data Flow and Dependency Graphs Basic Flow Graph Concepts Graph Application Categories Predefined Node Types Flow Graph Tips and Tricks Estimating Flow Graph Performance

Basic Flow Graph Concepts x

Flow Graph Basics: Graph Object Flow Graph Basics: Nodes Flow Graph Basics: Edges Flow Graph Basics: Mapping Nodes to Tasks Flow Graph Basics: Message Passing Protocol Flow Graph Basics: Single-push vs. Broadcast-push Flow Graph Basics: Buffering and Forwarding Flow Graph Basics: Reservation

Graph Application Categories x

Data Flow Graph Dependence Graph

Flow Graph Tips and Tricks x

Flow Graph Tips for Waiting for and Destroying a Flow Graph Flow Graph Tips on Making Edges Flow Graph Tips on Nested Parallelism Flow Graph Tips for Limiting Resource Consumption Flow Graph Tips for Exception Handling and Cancellation

Flow Graph Tips for Waiting for and Destroying a Flow Graph x

Always Use wait_for_all() Avoid Dynamic Node Removal Destroying Graphs That Run Outside the Main Thread

Flow Graph Tips on Making Edges x

Use make_edge and remove_edge Sending to One or Multiple Successors Communication Between Graphs Using input_node Avoiding Data Races

Flow Graph Tips on Nested Parallelism x

Use Nested Algorithms to Increase Scalability Use Nested Flow Graphs

Flow Graph Tips for Limiting Resource Consumption x

Using limiter_node Use Concurrency Limits Create a Token-Based System Attach Flow Graph to an Arbitrary Task Arena

Attach Flow Graph to an Arbitrary Task Arena x

Guiding Task Scheduler Execution Work Isolation

Flow Graph Tips for Exception Handling and Cancellation x

Catching Exceptions Inside the Node that Throws the Exception Cancel a Graph Explicitly Use graph::reset() to Reset a Canceled Graph Canceling Nested Parallelism

Exceptions and Cancellation x

Cancellation Without An Exception Cancellation and Nested Parallelism

Containers x

concurrent_hash_map concurrent_vector Concurrent Queue Classes Summary of Containers

concurrent_hash_map x

Throughput of pipeline

The throughput of a pipeline is the rate at which tokens flow through it, and is limited by two constraints. First, if a pipeline is run with N tokens, then obviously there cannot be more than N operations running in parallel. Selecting the right value of N may involve some experimentation. Too low a value limits parallelism; too high a value may demand too many resources (for example, more buffers). Second, the throughput of a pipeline is limited by the throughput of the slowest sequential filter. This is true even for a pipeline with no parallel filters. No matter how fast the other filters are, the slowest sequential filter is the bottleneck. So in general you should try to keep the sequential filters fast, and when possible, shift work to the parallel filters.

The text processing example has relatively poor speedup, because the serial filters are limited by the I/O speed of the system. Indeed, even with files that are on a local disk, you are unlikely to see a speedup much more than 2. To really benefit from a pipeline, the parallel filters need to be doing some heavy lifting compared to the serial filters.

The window size, or sub-problem size for each token, can also limit throughput. Making windows too small may cause overheads to dominate the useful work. Making windows too large may cause them to spill out of cache. A good guideline is to try for a large window size that still fits in cache. You may have to experiment a bit to find a good window size.

Level Two Title

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® oneAPI Threading Building Blocks Developer Guide and API Reference

Throughput of pipeline