Developer Guide

Developer Guide for Intel® oneAPI Math Kernel Library Windows*

ID 766692
Date 7/13/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Configuring Parameters

The most significant parameters in HPL.dat are P, Q, NB, and N. Specify them as follows:

  • P and Q - the number of rows and columns in the process grid, respectively.

    P*Q must be the number of MPI processes that HPL is using.

    Choose PQ.

  • N – the problem size:

    NOTE:

    Increasing N usually increases performance, but the size of N is bounded by memory. In general, you can compute the memory required to store the matrix (which does not count internal buffers) as 8*N*N/(P*Q) bytes, where N is the problem size and P and Q are the process grids in HPL.dat. A general rule is to choose a problem size that fills 80% of memory.

  • NB – the block size of the data distribution.

    The table below shows the recommended values of NB and element sizes for the CPU version:

    Processors

    Intel® Distribution for LINPACK* Benchmark

    Intel® Optimized HPL-AI* Benchmark

    Intel® Xeon Processor supporting Intel® Advanced Vector Extensions (Intel® AVX) instructions or older architecture 256 256
    Intel® Xeon Processor supporting Intel® Advanced Vector Extensions 2 (Intel® AVX2) instructions 192 192
    Intel® Xeon Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions 384 384
    Intel® Xeon Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions with Intel® Deep Learning Boost and bfloat16 384 768
    Intel® Xeon Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions with Intel® AMX bfloat16 384 1536
    Element size 8 bytes 4 bytes

    The table below shows the recommended values of NB and element sizes for the GPU version:

    Processors

    Intel® Distribution for LINPACK* Benchmark

    Intel® Optimized HPL-AI* Benchmark

    Intel® Data Center GPU Series 384 1152 or 1536
    Element size 8 bytes 2 bytes