Developer Guide


Managing Multi-core Performance

You can obtain best performance on systems with multi-core processors by requiring that
threads do not migrate from core to core. To do this, bind threads to the CPU cores by
setting an affinity mask to threads. Use one of the following options:
  • OpenMP facilities (if available), for example, the
    environment variable using the Intel OpenMP library
  • A system function, as explained below
  • Intel TBB facilities (if available), for example, the
    class (for details, see
Consider the following performance issue:
  • The system has two sockets with two cores each, for a total of four cores (CPUs).
  • The application sets the number of OpenMP threads to
    two and calls
    Intel® oneAPI Math Kernel Library
    to perform a Fourier transform
    . This call takes considerably different amounts of time from run to run.
To resolve this issue, before calling
Intel® oneAPI Math Kernel Library
, set an affinity mask for each OpenMP thread using the
environment variable or the
system function. The following code example shows how to resolve the issue by setting an affinity mask by operating system means using the Intel compiler. The code calls the function
to bind the threads to
on different sockets
. Then the
Intel® oneAPI Math Kernel Library
FFT function
is called:
#define _GNU_SOURCE //for using the GNU CPU affinity // (works with the appropriate kernel and glibc) // Set affinity mask #include <sched.h> #include <stdio.h> #include <unistd.h> #include <omp.h> int main(void) { int NCPUs = sysconf(_SC_NPROCESSORS_CONF); printf("Using thread affinity on %i NCPUs\n", NCPUs); #pragma omp parallel default(shared) { cpu_set_t new_mask; cpu_set_t was_mask; int tid = omp_get_thread_num(); CPU_ZERO(&new_mask); // 2 packages x 2 cores/pkg x 1 threads/core (4 total cores) CPU_SET(tid==0 ? 0 : 2, &new_mask); if (sched_getaffinity(0, sizeof(was_mask), &was_mask) == -1) { printf("Error: sched_getaffinity(%d, sizeof(was_mask), &was_mask)\n", tid); } if (sched_setaffinity(0, sizeof(new_mask), &new_mask) == -1) { printf("Error: sched_setaffinity(%d, sizeof(new_mask), &new_mask)\n", tid); } printf("tid=%d new_mask=%08X was_mask=%08X\n", tid, *(unsigned int*)(&new_mask), *(unsigned int*)(&was_mask)); } // Call Intel MKL FFT function return 0; }  
Compile the application with the Intel compiler using the following command:
icc test_application.c -openmp 
is the filename for the application.
Build the application. Run it in
threads, for example, by using the environment
variable to set the number of threads:
env OMP_NUM_THREADS=2 ./a.out
Linux Programmer's Manual
(in man pages format)
particulars of the
function used in the above example.
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at
Notice revision #20201201

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at