Developer Reference

Intel® oneAPI Math Kernel Library Vector Mathematics Performance and Accuracy Data

ID 772989
Date 12/04/2020
Public
Document Table of Contents

Intel® oneAPI Math Kernel Library Vector Mathematics Performance and Accuracy Data

Vector Mathematics (VM) computes elementary functions on vector arguments. VM includes a set of highly optimized implementations of computationally expensive core mathematical functions (power, trigonometric, exponential, hyperbolic, and others) that operate on vectors. VM can improve performance for applications like nonlinear optimization, computations of integrals, and others.

NOTE:
VM is an integral part of the Intel® oneAPI Math Kernel Library (oneMKL) and the VM terminology is used here for simplicity in discussing this group of functions.

The table below describes VM three accuracy modes and expected performance level and maximum accuracy error for single and double precision for each mode.

Performance / Max. Error High Accuracy (HA) Low Accuracy (LA) Enhanced Performance (EP)
Expected performance Default Better performance Best performance available
Maximum accuracy error 1 ULP* 4 ULP* The lower half of the significand bits may be incorrect

* Unit in the last place (ULP)

Most VM functions have different implementations corresponding to each of these three modes.

NOTE:
VM accuracy modes have no impact on functions that return exact results, such as CopySign.

Given the reduction in accuracy as described in the table, the EP mode may be adequate for applications that do not rely on accurate results, such as media applications or some Monte Carlo simulations.

Accuracy behavior is processor specific, so results might differ slightly across processor families and components of one family such as processor models or libraries. Results might also vary slightly from release to release. However, all differences are within specified error bounds. Error and special value behavior are identical for HA and LA functions regardless of the processor used to run the software. For the EP mode, correct error and special value behavior are not guaranteed.

To control the VM accuracy modes, use the vmlSetMode function. For more information, refer to the Intel® oneAPI Math Kernel Library Developer Reference.

NOTE on Performance:

Performance numbers in the tables are shown for working argument intervals. Performance behavior may be different for other intervals. For example, it is quite expensive to compute trigonometric functions accurately for huge arguments. Each function lists the working interval over which performance is measured. The same page contains graphs that show how the performance behavior depends on the vector length. There are two extreme cases: short and long vectors.

NOTE:
Logarithmic scale is used to show both cases.

For short vectors, functions incur certain overheads, which are amortized with an increasing vector length. For vectors longer than a few dozens of elements, the performance remains quite flat until the L2 cache size is exceeded due to the length of the vector.

Data prefetching greatly reduces the performance penalty for vectors that do not fit in the cache.

NOTE on Accuracy:

The design requirements for the HA functions are to have an accuracy error less than 1.0 ULP, and to have all special values processed correctly. For the LA functions, the error bound is 4.0 ULP.

For the EP functions, approximately one half of the bits in the significand (the most significant ones) of the floating-point result need to be correct. For details, see the accuracy tables with ULP errors for all the functions. Any deviations from these error bounds are highlighted in the accuracy tables, and should be considered to be temporary.

For complex functions, the ULP error is the maximum of the two ULP errors calculated for the real and the imaginary parts of the result.

Special Value Processing

Special values are processed in conformance with the C9X standard. See the information for the special value behavior of every function in the Intel® oneAPI Math Kernel Library Developer Reference.