What to do when Nested Parallelism Runs Amuck? Getting Started with Python* module for Threading Building Blocks (Intel® TBB) in Less than 30 Minutes!

ID 658440
Updated 12/27/2017
Version Latest



Introduction and Description of Product

Intel® Threading Building Blocks (Intel® TBB) is a portable, open-source parallel programming library from the parallelism experts at Intel. A Python module for Intel® TBB is included in the Intel® Distribution for Python and provides an out-of-the-box scheduling replacement to address common problems arising from nested parallelism. It handles coordination of both intra- and inter-process concurrency. This article will show you how to launch Python programs using the Python module for Intel® TBB to parallelize math from popular Python modules like NumPy* and SciPy* by way of Intel® Math Kernel Library (Intel® MKL) thread scheduling. Please note that Intel® MKL also comes bundled free with the Intel® Distribution for Python. Intel® TBB is the native threading library for Intel® Data Analytics Acceleration Library (Intel® DAAL), which is a high-performance analytics package with a fully functional Python API. Furthermore, If working with the full Intel® Distribution for Python package, it is also the native threading underneath Numba*, OpenCV*, and select Scikit-learn* algorithms (which have been accelerated with Intel® DAAL).


How to Get Intel® TBB

To install full Intel® Distribution for Python package, which includes Intel® TBB, click below for installation guides:

Anaconda* Package
YUM Repository
APT Repository
Docker* Images

To install from Anaconda cloud:

conda install –c intel tbb

(It will change to ‘tbb4py’ in Q1 of 2018. Article will be updated accordingly)


Drop-in Use with Interpreter Call (no other code changes)

Simply drop in Intel® TBB and determine if it is the right solution for your problem statement! 

Performance degradation due to over-subscription can be caused by nested parallel calls, many times unbeknownst to the user. These sort of “mistakes” are easy to make in a scripting environment. Intel® TBB can be turned on easily for out-of-the-box thread scheduling with no code changes. In the faith of the scripting culture of the Python community, this allows for quick checking of Intel® TBB’s performance recovery. If you already have math code written, you can easily launch with the “-m tbb ” interpreter flag, followed by script name and any required args for your script. It’s as easy as this:

python -m tbb script.py args*

NOTE: See the Interpreter Flag Reference Section for full list of available flags.


Interpreter Flag Reference

Command Line Usage
python -m tbb [-h] [--ipc] [-a] [--allocator-huge-pages] [-p P] [-b] [-v] [-m] script.py args*
Get Help from Command Line
python -m tbb –-help
pydoc tbb
List of the currently available interpreter flags
Interpreter Flag Description of Instruction



show this help message and exit


Executes following as a module (default: False)



Enable TBB scalable allocator as a replacement for standard memory allocator (default: False)


Enable huge pages for TBB allocator (implies: -a) (default: False)

-p P,

--max-num-threads P

Initialize TBB with P max number of threads per process (default: number of available logical processors on system)



Block TBB initialization until all the threads are created before continue the script. This is necessary for performance benchmarks that want to exclude TBB initialization from the measurements (default: False)



Request verbose and version information (default: False)


Enable inter-process (IPC) coordination between TBB schedulers (default: False)


Additional Links

Intel Product Page

Short Introduction Video

SciPy 2017 proceedings

SciPY 2016 Video Presentation

DASK* with Intel® TBB Blog Post