Developer Guide

## Developer Guide for Intel® oneAPI Math Kernel Library Windows*

ID 766692
Date 3/31/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

## OpenMP* Threaded Functions and Problems

The following Intel® oneAPI Math Kernel Library function domains are threaded with the OpenMP* technology:

• Direct sparse solver.

• LAPACK.

For a list of threaded routines, see LAPACK Routines.

• Level1 and Level2 BLAS.

For a list of threaded routines, see BLAS Level1 and Level2 Routines.

• All Level 3 BLAS and all Sparse BLAS routines except Level 2 Sparse Triangular solvers.

• All Vector Mathematics functions (except service functions).

• FFT.

For a list of FFT transforms that can be threaded, see Threaded FFT Problems.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Notice revision #20201201

### LAPACK Routines

In this section, ? stands for a precision prefix of each flavor of the respective routine and may have the value of s, d, c, or z.

The following LAPACK routines are threaded with OpenMP*:

• Linear equations, computational routines:
• Factorization: ?getrf, ?getrfnpi, ?gbtrf, ?potrf, ?pptrf, ?sytrf, ?hetrf, ?sptrf, ?hptrf
• Solving: ?dttrsb, ?gbtrs, ?gttrs, ?pptrs, ?pbtrs, ?pttrs, ?sytrs, ?sptrs, ?hptrs, ?tptrs, ?tbtrs
• Orthogonal factorization, computational routines:
?geqrf, ?ormqr, ?unmqr, ?ormlq, ?unmlq, ?ormql, ?unmql, ?ormrq, ?unmrq
• Singular Value Decomposition, computational routines:
?gebrd, ?bdsqr
• Symmetric Eigenvalue Problems, computational routines:
?sytrd, ?hetrd, ?sptrd, ?hptrd, ?steqr, ?stedc
.
• Generalized Nonsymmetric Eigenvalue Problems, computational routines:
chgeqz/zhgeqz
.

A number of other LAPACK routines, which are based on threaded LAPACK or BLAS routines, make effective use of OpenMP* parallelism:
?gesv, ?posv, ?gels, ?gesvd, ?syev, ?heev, cgegs/zgegs, cgegv/zgegv, cgges/zgges, cggesx/zggesx, cggev/zggev, cggevx/zggevx, and so on.

### Threaded BLAS Level1 and Level2 Routines

In the following list, ? stands for a precision prefix of each flavor of the respective routine and may have the value of s, d, c, or z.

The following routines are threaded with OpenMP*:

• Level1 BLAS:
?axpy, ?copy, ?swap, ddot/sdot, cdotc, drot/srot
• Level2 BLAS:
?gemv, ?trsv, ?trmv, dsyr/ssyr, dsyr2/ssyr2, dsymv/ssymv

### Threaded FFT Problems

The following characteristics of a specific problem determine whether your FFT computation may be threaded with OpenMP*:

• rank
• domain
• size/length
• precision (single or double)
• placement (in-place or out-of-place)
• strides
• number of transforms
• layout (for example, interleaved or split layout of complex data)

Most FFT problems are threaded. In particular, computation of multiple transforms in one call (number of transforms > 1) is threaded. Details of which transforms are threaded follow.

One-dimensional (1D) transforms

1D transforms are threaded in many cases.

1D complex-to-complex (c2c) transforms of size N using interleaved complex data layout are threaded under the following conditions depending on the architecture:

Architecture

Conditions

Intel® 64

N is a power of 2, log2(N) > 9, the transform is double-precision out-of-place, and input/output strides equal 1.

IA-32

N is a power of 2, log2(N) > 13, and the transform is single-precision.

N is a power of 2, log2(N) > 14, and the transform is double-precision.

Any

N is composite, log2(N) > 16, and input/output strides equal 1.

1D complex-to-complex transforms using split-complex layout are not threaded.

Multidimensional transforms

All multidimensional transforms on large-volume data are threaded.