OpenMP*
Threaded Functions and Problems
OpenMP*
Threaded Functions and ProblemsThe following
function domains are threaded
Intel® oneAPI Math Kernel Library
with the OpenMP* technology
:
 Direct sparse solver.
 LAPACK.For a list of threaded routines, see LAPACK Routines.
 Level1 and Level2 BLAS.For a list of threaded routines, see BLAS Level1 and Level2 Routines.
 All Level 3 BLAS and all Sparse BLAS routines except Level 2 Sparse Triangular solvers.
 All Vector Mathematics functions (except service functions).
 FFT.For a list of FFT transforms that can be threaded, see Threaded FFT Problems.
Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at
www.Intel.com/PerformanceIndex.
Notice revision #20201201

LAPACK Routines
In this section,
?
stands for a precision prefix of
each
flavor of the respective routine and may have the value of
s, d, c
, or
z
.
The following LAPACK routines are threaded
with OpenMP*
:
 Linear equations, computational routines:
 Factorization:?getrf, ?getrfnpi, ?gbtrf, ?potrf, ?pptrf, ?sytrf, ?hetrf, ?sptrf, ?hptrf
 Solving:?dttrsb, ?gbtrs, ?gttrs, ?pptrs, ?pbtrs, ?pttrs, ?sytrs, ?sptrs, ?hptrs, ?tptrs, ?tbtrs
 Orthogonal factorization, computational routines:?geqrf, ?ormqr, ?unmqr, ?ormlq, ?unmlq, ?ormql, ?unmql, ?ormrq, ?unmrq
 Singular Value Decomposition, computational routines:?gebrd, ?bdsqr
 Symmetric Eigenvalue Problems, computational routines:?sytrd, ?hetrd, ?sptrd, ?hptrd, ?steqr, ?stedc.
 Generalized Nonsymmetric Eigenvalue Problems, computational routines:chgeqz/zhgeqz.
A number of other LAPACK routines, which are based on threaded LAPACK or BLAS routines, make effective use of
OpenMP*
parallelism:
?gesv, ?posv, ?gels, ?gesvd, ?syev, ?heev, cgegs/zgegs, cgegv/zgegv, cgges/zgges,
cggesx/zggesx, cggev/zggev, cggevx/zggevx,
and so on.
Threaded BLAS Level1 and Level2 Routines
In the following list,
?
stands for a precision prefix of
each
flavor of the respective routine and may have the value of
s, d, c
, or
z
.
The following routines are threaded
with OpenMP*
:
 Level1 BLAS:?axpy, ?copy, ?swap, ddot/sdot, cdotc, drot/srot
 Level2 BLAS:?gemv, ?trsv, ?trmv, dsyr/ssyr, dsyr2/ssyr2, dsymv/ssymv
Threaded FFT Problems
The following characteristics of a specific problem determine whether your FFT computation may be threaded
with OpenMP*
:
 rank
 domain
 size/length
 precision (single or double)
 placement (inplace or outofplace)
 strides
 number of transforms
 layout (for example, interleaved or split layout of complex data)
Most FFT problems are threaded. In particular, computation of multiple transforms in one call (number of transforms > 1) is threaded. Details of which transforms are threaded follow.
Onedimensional (1D) transforms
1D transforms are threaded in many cases.
1D complextocomplex (c2c) transforms of size
N
using interleaved complex data layout are threaded under the following conditions depending on the architecture:
Architecture
 Conditions


Intel® 64
 N is a power of 2,
log _{2}(N ) > 9, the transform is doubleprecision outofplace, and input/output strides equal 1.

IA32
 N is a power of 2,
log _{2}(N ) > 13, and the transform is singleprecision.

N is a power of 2,
log _{2}(N ) > 14, and the transform is doubleprecision.
 
Any
 N is composite,
log _{2}(N ) > 16, and input/output strides equal 1.

1D complextocomplex transforms using splitcomplex layout are not threaded.
Multidimensional transforms
All multidimensional transforms on largevolume data are threaded.