Developer Guide

Developer Guide for Intel® oneAPI Math Kernel Library Linux*

ID 766690
Date 3/22/2024
Public
Document Table of Contents

Choosing the Best HPCG Configuration for GPUs

The performance of the Intel GPU Optimized HPCG depends on many system parameters including (but not limited to) the hardware configuration of the host node and one or more devices attached to the node as well as the MPI implementation used. To get the best performance for a specific system configuration, choose a combination of these parameters:

  • The number of MPI processes per host node (defining work on the host + an attached device)

  • The number of OpenMP* threads per MPI process for reference code used in validation of benchmark

  • The local problem size

On Intel® Data Center GPU Max Series GPUs, we recommend the use of one MPI process per tile with a large local problem size. With modern GPUs, the last level cache (LLC) sizes per tile can be either extremely large or quite small, and the device memory can be quite limited. So to comply with current HPCG benchmark requirements, the local problem size (nx x ny x nz) should be large enough so that the size of a vector from the benchmark (each vector is nx*ny*nz*sizeof(double) bytes) does not completely fit in the LLC of the device, but not too large that the full benchmark system doesn’t fit in device memory.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Notice revision #20201201