Configuring Parameters

Developer Guide

Developer Guide for Intel® oneAPI Math Kernel Library Windows*

Download PDF

ID 766692

Date 3/22/2024

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

The most significant parameters in HPL.dat are P, Q, NB, and N. Specify them as follows:

P and Q - the number of rows and columns in the process grid, respectively.

P*Q must be the number of MPI processes that HPL is using.

Choose P≤Q.
N – the problem size:
- For homogeneous runs, choose N divisible by NB*LCM(P,Q), where LCM is the least common multiple of the two numbers.
- For heterogeneous runs, see Heterogeneous Support in the Intel® Distribution for LINPACK* Benchmark for how to choose N.
NOTE:

Increasing N usually increases performance, but the size of N is bounded by memory. In general, you can compute the memory required to store the matrix (which does not count internal buffers) as 8*N*N/(P*Q) bytes, where N is the problem size and P and Q are the process grids in HPL.dat. A general rule is to choose a problem size that fills 80% of memory.

NB – the block size of the data distribution.

The table below shows the recommended values of NB and element sizes for the CPU version:

Processors	Intel® Distribution for LINPACK* Benchmark	Intel® Optimized HPL-AI* Benchmark
Intel® Xeon Processor supporting Intel® Advanced Vector Extensions 2 (Intel® AVX2) instructions	192	192
Intel® Xeon Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions	384	384
Intel® Xeon Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions with Intel® Deep Learning Boost and bfloat16	384	768
Intel® Xeon Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions with Intel® AMX bfloat16	384	1536
Element size	8 bytes	4 bytes

The table below shows the recommended values of NB and element sizes for the GPU version:

Processors	Intel® Distribution for LINPACK* Benchmark	Intel® Optimized HPL-AI* Benchmark
Intel® Data Center GPU Series	384	1152 or 1536
Element size	8 bytes	2 bytes

Parent topic: Intel® Distribution for LINPACK* Benchmark and Intel® Optimized HPL-AI* Benchmark