The performance of the Intel Optimized HPCG depends on many system parameters including (but not limited to) the hardware configuration of the host and MPI implementation used. To get the best performance for a specific system configuration, choose a combination of these parameters:
The number of MPI processes per host and OpenMPI threads per process
On Intel® Xeon® processor-based clusters, use the Intel AVX, Intel AVX2, or Intel AVX-512 optimized version of the benchmark depending on the supported instruction set and run one MPI process per CPU socket and one OpenMP* thread per physical CPU core skipping SMT threads.
On systems based on Intel® Xeon® Phi processors, use the Intel AVX-512 optimized version with four MPI processes per processor. Set the number of OpenMP threads to two for each processor core, with SMT turned on. For example, on Intel® Xeon® Phi processor 7250 which has 68 cores, each MPI process should run 34 OpenMP threads.
For best performance, use the problem size that is large enough to better utilize available cores, but not too large, so that all tasks fit the available memory.
Product and Performance Information
Notice revision #20201201