Managing Multi-core
Performance
You can obtain best performance
on systems with multi-core processors by requiring that
threads do not migrate from core
to core. To do this, bind threads to the CPU cores by
setting an affinity mask to
threads. Use one of the following options:
- OpenMP facilities (if available), for example, theKMP_AFFINITYenvironment variable using the Intel OpenMP library
- A system function, as explained below
- Intel TBB facilities (if available), for example, thetbb::affinity_partitionerclass (for details, see https://www.threadingbuildingblocks.org/documentation)
Consider the following
performance issue:
- The system has two sockets with two cores each, for a total of four cores (CPUs).
- The application sets the number of OpenMP threads tofour and calls an. This call takes considerably different amounts of time from run to run.LAPACK routineIntel® oneAPI Math Kernel Library
To resolve this issue, before calling , set an affinity mask for each OpenMP thread using the system function. The following code example shows how to resolve
the issue by setting an affinity mask by operating system means using the Intel
compiler. The code calls the function
Intel® oneAPI Math Kernel Library
KMP_AFFINITY
environment variable or the
SetThreadAffinityMask
SetThreadAffinityMask
to bind the threads to
appropriate
cores
,
preventing migration of the
threads
. Then the
Intel® oneAPI Math Kernel Library
LAPACK routine
is called:
// Set affinity mask
#include <windows.h>
#include <omp.h>
int main(void) {
#pragma omp parallel default(shared)
{
int tid = omp_get_thread_num();
// 2 packages x 2 cores/pkg x 1 threads/core (4 total cores)
DWORD_PTR mask = (1 << (tid == 0 ? 0 : 2 ));
SetThreadAffinityMask( GetCurrentThread(), mask );
}
// Call Intel MKL LAPACK routine
return 0;
}
Compile the application with the
Intel compiler using the following command:
icl /Qopenmp test_application.c
where
test_application.c
is the filename for the
application.
Build the application. Run it in
four
threads, for example, by using
the environment
variable to set the number of
threads:
set OMP_NUM_THREADS=4 test_application.exe
See
Windows API documentation at
msdn.microsoft.com/
for
the restrictions on the usage of
Windows API routines and
particulars of the
SetThreadAffinityMask
function used in the above
example.
See also a similar example at
en.wikipedia.org/wiki/Affinity_mask
.
Product and Performance Information
|
---|
Performance varies by use, configuration and other factors. Learn more at
www.Intel.com/PerformanceIndex.
Notice revision #20201201
|