What to do when Nested Parallelism Runs Amuck? Getting Started with...

Introduction and Description of Product

Intel® Threading Building Blocks (Intel® TBB) is a portable, open-source parallel programming library from the parallelism experts at Intel. A Python module for Intel® TBB is included in the Intel® Distribution for Python and provides an out-of-the-box scheduling replacement to address common problems arising from nested parallelism. It handles coordination of both intra- and inter-process concurrency. This article will show you how to launch Python programs using the Python module for Intel® TBB to parallelize math from popular Python modules like NumPy* and SciPy* by way of Intel® Math Kernel Library (Intel® MKL) thread scheduling. Please note that Intel® MKL also comes bundled free with the Intel® Distribution for Python. Intel® TBB is the native threading library for Intel® Data Analytics Acceleration Library (Intel® DAAL), which is a high-performance analytics package with a fully functional Python API. Furthermore, If working with the full Intel® Distribution for Python package, it is also the native threading underneath Numba*, OpenCV*, and select Scikit-learn* algorithms (which have been accelerated with Intel® DAAL).

How to Get Intel® TBB

To install full Intel® Distribution for Python package, which includes Intel® TBB, click below for installation guides:

Anaconda* Package
YUM Repository
APT Repository
Docker* Images

To install from Anaconda cloud:


conda install –c https://software.repos.intel.com/python/conda/ tbb

(It will change to ‘tbb4py’ in Q1 of 2018. Article will be updated accordingly)

Drop-in Use with Interpreter Call (no other code changes)

Simply drop in Intel® TBB and determine if it is the right solution for your problem statement!

Performance degradation due to over-subscription can be caused by nested parallel calls, many times unbeknownst to the user. These sort of “mistakes” are easy to make in a scripting environment. Intel® TBB can be turned on easily for out-of-the-box thread scheduling with no code changes. In the faith of the scripting culture of the Python community, this allows for quick checking of Intel® TBB’s performance recovery. If you already have math code written, you can easily launch with the “-m tbb ” interpreter flag, followed by script name and any required args for your script. It’s as easy as this:


python -m tbb script.py args*

NOTE: See the Interpreter Flag Reference Section for full list of available flags.

Interpreter Flag Reference

Command Line Usage


python -m tbb [-h] [--ipc] [-a] [--allocator-huge-pages] [-p P] [-b] [-v] [-m] script.py args*

Get Help from Command Line


python -m tbb –-help
pydoc tbb

List of the currently available interpreter flags

Interpreter Flag	Description of Instruction
-h, --help	show this help message and exit
-m	Executes following as a module (default: False)
-a, --allocator	Enable TBB scalable allocator as a replacement for standard memory allocator (default: False)
--allocator-huge-pages	Enable huge pages for TBB allocator (implies: -a) (default: False)
-p P, --max-num-threads P	Initialize TBB with P max number of threads per process (default: number of available logical processors on system)
-b, --benchmark	Block TBB initialization until all the threads are created before continue the script. This is necessary for performance benchmarks that want to exclude TBB initialization from the measurements (default: False)
-v, --verbose	Request verbose and version information (default: False)
--ipc	Enable inter-process (IPC) coordination between TBB schedulers (default: False)

Additional Links

Intel Product Page

Short Introduction Video

SciPy 2017 proceedings

SciPY 2016 Video Presentation

DASK* with Intel® TBB Blog Post

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

What to do when Nested Parallelism Runs Amuck? Getting Started with Python* module for Threading Building Blocks (Intel® TBB) in Less than 30 Minutes!

Introduction and Description of Product

How to Get Intel® TBB

Drop-in Use with Interpreter Call (no other code changes)

Simply drop in Intel® TBB and determine if it is the right solution for your problem statement!

Interpreter Flag Reference

Command Line Usage

Get Help from Command Line

List of the currently available interpreter flags

Additional Links

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

What to do when Nested Parallelism Runs Amuck? Getting Started with Python* module for Threading Building Blocks (Intel® TBB) in Less than 30 Minutes!

Introduction and Description of Product

How to Get Intel® TBB

Drop-in Use with Interpreter Call (no other code changes)

Simply drop in Intel® TBB and determine if it is the right solution for your problem statement!

Interpreter Flag Reference

Command Line Usage

Get Help from Command Line

List of the currently available interpreter flags

Additional Links

Product and Performance Information