Intel® Threading Building Blocks Code Analysis
Use the
Intel® VTune™
for performance analysis of application targets using
Profiler
Intel® oneAPI Threading Building Blocks
(oneTBB
).
If you used the Intel® Runtime libraries in your application, you can run:
- Hotspots and Threading analysis to explore the application parallelization efficiency based ononeTBBparallel or synchronization constructs.
- Threading analysis to get detailed information ononeTBBsynchronization objects that limited the parallel performance of your multithreaded application.
Using Intel C++ compiler is recommended to get more comprehensive diagnostics from the
VTune
.
Profiler
Start exploration of
oneTBB
parallelization efficiency with Hotspots. Look at the
Effective CPU Utilization Histogram
to see the parallelization level of your application. Note that the histogram reflects the parallelization levels of your application based on the effective time spent subtracting time spent in threading runtimes.

If you see a significant portion of your elapsed time spent with Idle or Poor CPU utilization, explore the
Top Hotspots
table. Flagged
oneTBB
functions might mean that the application spends CPU time in the
oneTBB
runtime because of parallel inefficiencies like scheduling overhead or imbalance. To discover the reason, hover over the flag.

The
Bottom-up
tab can give you more details about synchronization or overhead in particular
oneTBB
constructs. Expand the
Spin Time
and
Overhead Time
columns in the grid to determine why a particular
oneTBB
runtime function had a higher than usual execution time.
oneTBB
runtime functions are flagged when they consume more than 5% of the CPU time.
For example, an
oneTBB
runtime function with a high Scheduling value may indicate that your application has threading work divided into small pieces, which leads to excessive scheduling overhead as the application calls to the runtime. You can resolve this issue by increasing the threading chunk size.

If there is an idle wait time when the
oneTBB
runtime does not burn the CPU on synchronization, it is useful to run the Threading analysis to explore synchronization bottlenecks that can prevent effective CPU utilization.
VTune
recognizes all types of Intel TBB synchronization objects. If you assign a meaningful name to an object you create in the source code, the
Profiler
VTune
recognizes and represents it in the Result tab. For performance reasons, this functionality is not enabled by default in
Profiler
oneTBB
headers. To make the user-defined objects visible to the
VTune
, recompile your application with
Profiler
TBB_USE_THREADING_TOOLS
set to 1.
To display an overhead introduced by
oneTBB
library internals, the
VTune
creates a pseudo synchronization object
Profiler
TBB Scheduler
that includes all waits from the
oneTBB
runtime libraries.