- PandQ- the number of rows and columns in the process grid, respectively.P*Qmust be the number of MPI processes that HPL is using.ChooseP≤Q.
- NB- the block size of the data distribution.The table below shows recommended values ofNBfor different Intel® processors:ProcessorNBIntel® Xeon® Processor X56*/E56*/E7-*/E7*/X7* (codenamed Nehalem or Westmere)256Intel Xeon Processor E26*/E26* v2 (codenamed Sandy Bridge or Ivy Bridge)256Intel Xeon Processor E26* v3/E26* v4 (codenamed Haswell or Broadwell)192Intel® Core™ i3/i5/i7-6* Processor (codenamed Skylake Client)192Intel® Xeon Phi™ Processor 72* (codenamed Knights Landing)336Intel Xeon Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions (codenamed Skylake Server)384
- N- the problem size:
IncreasingNusually increases performance, but the size ofNis bounded by memory. In general, you can compute the memory required to store the matrix (which does not count internal buffers) as 8*N*N/(P*Q) bytes, whereNis the problem size andPandQare the process grids inHPL.dat. A general rule of thumb is to choose a problem size that fills 80% of memory.
- For homogeneous runs, chooseNdivisible byNB*LCM(P,Q), whereLCMis the least common multiple of the two numbers.