Introduction and Description of Product
Intel® Threading Building Blocks (Intel® TBB) is a portable, open-source parallel programming library from the parallelism experts at Intel. A Python module for Intel® TBB is included in the Intel® Distribution for Python and provides an out-of-the-box scheduling replacement to address common problems arising from nested parallelism. It handles coordination of both intra- and inter-process concurrency. This article will show you how to launch Python programs using the Python module for Intel® TBB to parallelize math from popular Python modules like NumPy* and SciPy* by way of Intel® Math Kernel Library (Intel® MKL) thread scheduling. Please note that Intel® MKL also comes bundled free with the Intel® Distribution for Python. Intel® TBB is the native threading library for Intel® Data Analytics Acceleration Library (Intel® DAAL), which is a high-performance analytics package with a fully functional Python API. Furthermore, If working with the full Intel® Distribution for Python package, it is also the native threading underneath Numba*, OpenCV*, and select Scikit-learn* algorithms (which have been accelerated with Intel® DAAL).
How to Get Intel® TBB
To install full Intel® Distribution for Python package, which includes Intel® TBB, click below for installation guides:
Anaconda* Package
YUM Repository
APT Repository
Docker* Images
To install from Anaconda cloud:
(It will change to ‘tbb4py’ in Q1 of 2018. Article will be updated accordingly)
Drop-in Use with Interpreter Call (no other code changes)
Simply drop in Intel® TBB and determine if it is the right solution for your problem statement!
Performance degradation due to over-subscription can be caused by nested parallel calls, many times unbeknownst to the user. These sort of “mistakes” are easy to make in a scripting environment. Intel® TBB can be turned on easily for out-of-the-box thread scheduling with no code changes. In the faith of the scripting culture of the Python community, this allows for quick checking of Intel® TBB’s performance recovery. If you already have math code written, you can easily launch with the “-m tbb ” interpreter flag, followed by script name and any required args for your script. It’s as easy as this:
NOTE: See the Interpreter Flag Reference Section for full list of available flags.
Interpreter Flag Reference
Command Line Usage
Get Help from Command Line
List of the currently available interpreter flags
Interpreter Flag | Description of Instruction |
---|---|
-h, --help |
show this help message and exit |
-m |
Executes following as a module (default: False) |
-a, --allocator |
Enable TBB scalable allocator as a replacement for standard memory allocator (default: False) |
--allocator-huge-pages |
Enable huge pages for TBB allocator (implies: -a) (default: False) |
-p P, --max-num-threads P |
Initialize TBB with P max number of threads per process (default: number of available logical processors on system) |
-b, --benchmark |
Block TBB initialization until all the threads are created before continue the script. This is necessary for performance benchmarks that want to exclude TBB initialization from the measurements (default: False) |
-v, --verbose |
Request verbose and version information (default: False) |
--ipc |
Enable inter-process (IPC) coordination between TBB schedulers (default: False) |
Additional Links
DASK* with Intel® TBB Blog Post