Developer Guide

Developer Guide for Intel® oneAPI Math Kernel Library Linux*

ID 766690
Date 3/31/2023

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents
Give Feedback

Choosing Best Configuration and Problem Sizes

The performance of the Intel Optimized HPCG depends on many system parameters including (but not limited to) the hardware configuration of the host and MPI implementation used. To get the best performance for a specific system configuration, choose a combination of these parameters:

  • The number of MPI processes per host and OpenMPI threads per process

  • Local problem size

On Intel® Xeon® processor-based clusters, use the Intel AVX, Intel AVX2, or Intel AVX-512 optimized version of the benchmark depending on the supported instruction set and run one MPI process per CPU socket and one OpenMP* thread per physical CPU core skipping SMT threads.

On systems based on Intel® Xeon® Phi processors, use the Intel AVX-512 optimized version with four MPI processes per processor. Set the number of OpenMP threads to two for each processor core, with SMT turned on. For example, on Intel® Xeon® Phi processor 7250 which has 68 cores, each MPI process should run 34 OpenMP threads.

For best performance, use the problem size that is large enough to better utilize available cores, but not too large, so that all tasks fit the available memory.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at

Notice revision #20201201