Obtaining Numerically Reproducible Results
Intel® oneAPI Math Kernel Library (oneMKL)
Intel® oneAPI Math Kernel Library
- Calls tooccur in a single executableIntel® oneAPI Math Kernel Library
- The number of computational threads used by the library does not change in the run
For a limited set of routines, you can eliminate the second condition by using
in
strict CNR mode.
Intel® oneAPI Math Kernel Library
It is well known that for general single and double precision IEEE floating-point numbers, the associative property does not always hold, meaning (a+b)+c may not equal a +(b+c). Let's consider a specific example. In infinite precision arithmetic 2
-63
+ 1 + -1 = 2-63
. If this same computation is done on a computer using double precision floating-point numbers, a rounding error is introduced, and the order of operations becomes important:
(2
-63
+ 1) + (-1)
≃
1 + (-1) = 0
versus
2
-63
+ (1 + (-1))
≃
2-63
+ 0 = 2-63
This inconsistency in results due to order of operations is precisely what the new functionality addresses.
The application related factors that affect the order of floating-point operations within a single executable program include selection of a code path based on run-time processor dispatching, alignment of data arrays, variation in number of threads, threaded algorithms and internal floating-point control settings. You can control most of these factors by controlling the number of threads and floating-point settings and by taking steps to align memory when it is allocated (see the
Getting Reproducible Results with Intel® MKL knowledge base article for details). However, run-time dispatching and certain threaded algorithms do not allow users to make changes that can ensure the same order of operations from run to run.
Intel® oneAPI Math Kernel Library
Intel® oneAPI Math Kernel Library
Intel® oneAPI Math Kernel Library
Intel® oneAPI Math Kernel Library
2
(Intel® AVX2
). The feature-based approach introduces a challenge: if any of the internal floating-point operations are done in a different order or are re-associated, the computed results may differ.
Dispatching optimized code paths based on the capabilities of the processor on which the code is running is central to the optimization approach used by
. So it is natural that consistent results require some performance trade-offs. If limited to a particular code path, performance of
can in some circumstances degrade by more than a half. To understand this, note that matrix-multiply performance nearly doubled with the introduction of new processors supporting Intel AVX
Intel® oneAPI Math Kernel Library
Intel® oneAPI Math Kernel Library
2
instructions. Even if the code branch is not restricted, performance can degrade by 10-20% because the new functionality restricts algorithms to maintain the order of operations.
Product and Performance Information
|
---|
Performance varies by use, configuration and other factors. Learn more at
www.Intel.com/PerformanceIndex.
Notice revision #20201201
|