Estimating Flow Graph Performance

Intel® oneAPI Threading Building Blocks Developer Guide and API Reference

Download PDF

ID 772616

Date 3/31/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Document Table of Contents x

Intel® oneAPI Threading Building Blocks (oneTBB)

Intel® oneAPI Threading Building Blocks (oneTBB) x

Getting Help and Support Notational Conventions Introduction oneTBB Benefits oneTBB Developer Guide oneTBB API Reference Notices and Disclaimers

oneTBB Developer Guide x

Package Contents Parallelizing Simple Loops Parallelizing Complex Loops Parallelizing Data Flow and Dependence Graphs Work Isolation Exceptions and Cancellation Floating-point Settings Containers Mutual Exclusion Timing Memory Allocation The Task Scheduler Design Patterns Migrating from Threading Building Blocks (TBB) Constrained APIs Appendix A Costs of Time Slicing Appendix B Mixing With Other Threading Packages References

Package Contents x

Debug Versus Release Libraries Scalable Memory Allocator Windows* Linux* macOS*

Parallelizing Simple Loops x

Initializing and Terminating the Library parallel_for parallel_reduce Advanced Example Advanced Topic: Other Kinds of Iteration Spaces

parallel_for x

Lambda Expressions Automatic Chunking Controlling Chunking Bandwidth and Cache Affinity Partitioner Summary

Parallelizing Complex Loops x

Cook Until Done: parallel_for_each Working on the Assembly Line: parallel_pipeline Summary of Loops and Pipelines

Working on the Assembly Line: parallel_pipeline x

Using Circular Buffers Throughput of pipeline Non-Linear Pipelines

Parallelizing Data Flow and Dependence Graphs x

Parallelizing Data Flow and Dependency Graphs Basic Flow Graph Concepts Graph Application Categories Predefined Node Types Flow Graph Tips and Tricks Estimating Flow Graph Performance

Basic Flow Graph Concepts x

Flow Graph Basics: Graph Object Flow Graph Basics: Nodes Flow Graph Basics: Edges Flow Graph Basics: Mapping Nodes to Tasks Flow Graph Basics: Message Passing Protocol Flow Graph Basics: Single-push vs. Broadcast-push Flow Graph Basics: Buffering and Forwarding Flow Graph Basics: Reservation

Graph Application Categories x

Data Flow Graph Dependence Graph

Flow Graph Tips and Tricks x

Flow Graph Tips for Waiting for and Destroying a Flow Graph Flow Graph Tips on Making Edges Flow Graph Tips on Nested Parallelism Flow Graph Tips for Limiting Resource Consumption Flow Graph Tips for Exception Handling and Cancellation

Flow Graph Tips for Waiting for and Destroying a Flow Graph x

Always Use wait_for_all() Avoid Dynamic Node Removal Destroying Graphs That Run Outside the Main Thread

Flow Graph Tips on Making Edges x

Use make_edge and remove_edge Sending to One or Multiple Successors Communication Between Graphs Using input_node Avoiding Data Races

Flow Graph Tips on Nested Parallelism x

Use Nested Algorithms to Increase Scalability Use Nested Flow Graphs

Flow Graph Tips for Limiting Resource Consumption x

Using limiter_node Use Concurrency Limits Create a Token-Based System Attach Flow Graph to an Arbitrary Task Arena

Attach Flow Graph to an Arbitrary Task Arena x

Guiding Task Scheduler Execution Work Isolation

Flow Graph Tips for Exception Handling and Cancellation x

Catching Exceptions Inside the Node that Throws the Exception Cancel a Graph Explicitly Use graph::reset() to Reset a Canceled Graph Canceling Nested Parallelism

Exceptions and Cancellation x

Cancellation Without An Exception Cancellation and Nested Parallelism

Containers x

concurrent_hash_map concurrent_vector Concurrent Queue Classes Summary of Containers

concurrent_hash_map x

Estimating Flow Graph Performance

The performance or scalability of a flow graph is not easy to predict. However there are a few key points that can guide you in estimating the limits on performance and speedup of some graphs.

The Critical Path Limits the Scalability in a Dependence Graph

A critical path is the most time consuming path from a node with no predecessors to a node with no successors. In a dependence graph, the execution of the nodes along a path cannot be overlapped since they have a strict ordering. Therefore, for a dependence graph, the critical path limits scalability.

More formally, let T be the total time consumed by all of the nodes in your graph if executed sequentially. Then let C be the time consumed along the path that takes the most time. The nodes along this path cannot be overlapped even in a parallel execution. Therefore, even if all other paths are executed in parallel with C, the wall clock time for the parallel execution is at least C, and the maximum possible speedup (ignoring microarchitectural and memory effects) is T/C.

There is Overhead in Spawning a Node’s Body as a Task

The bodies of input_nodes, function_nodes, continue_nodes and multifunction_nodes execute within spawned tasks by default. This means that you need to take into account the overhead of task scheduling when estimating the time it takes for a node to execute its body. All of the rules of thumb for determining the appropriate granularity of tasks therefore also apply to node bodies as well. If you have many fine-grained nodes in your flow graph, the impact of these overheads can noticeably impact your performance. However, depending on the graph structure, you can reduce such overheads by using lightweight policy with these nodes.

Level Two Title

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® oneAPI Threading Building Blocks Developer Guide and API Reference

Estimating Flow Graph Performance