Developer Guide

Developer Guide for Intel® oneAPI Math Kernel Library Linux*

ID 766690
Date 12/16/2022

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Overview of the Intel Optimized HPCG

The Intel® Optimized High Performance Conjugate Gradient Benchmark (Intel® Optimized HPCG) provides an implementation of the HPCG benchmark ( optimized for Intel® Xeon® processors and Intel® Xeon Phi™ processors with Intel® Advanced Vector Extensions (Intel® AVX), Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® Advanced Vector Extensions 512 (Intel® AVX-512) support. The HPCG Benchmark is intended to complement the High Performance LINPACK benchmark used in the TOP500 ( system ranking by providing a metric that better aligns with a broader set of important cluster applications.

The HPCG benchmark implementation is based on a 3-dimensional (3D) regular 27-point discretization of an elliptic partial differential equation. The implementation calls a 3D domain to fill a 3D virtual process grid for all the available MPI ranks. HPCG uses the preconditioned conjugate gradient method (CG) to solve the intermediate systems of equations and incorporates a local and symmetric Gauss-Seidel preconditioning step that requires a triangular forward solve and a backward solve. A synthetic multi-grid V-cycle is used on each preconditioning step to make the benchmark better fit real-world applications. HPCG implements matrix multiplication locally, with an initial halo exchange between neighboring processes. The benchmark exhibits irregular accesses to memory and fine-grain recursive computations that dominate many scientific workloads.

The Intel® Optimized HPCG contains source code of the HPCG v3.0 reference implementation with necessary modifications to include:

  • Intel® architecture optimizations

  • Prebuilt benchmark executables that link to Intel® oneAPI Math Kernel Library (oneMKL)

    • Inspector-executor Sparse BLAS kernels for sparse matrix-vector multiplication (SpMV)
    • Sparse triangular solve (TRSV)

    • Symmetric Gauss-Seidel smoother (SYMGS)

that are optimized for Intel AVX, Intel AVX2, and Intel AVX-512 instruction sets. For the Intel AVX-512 instruction set, there are separate versions that target Intel® Xeon® Scalable processors and Intel® Xeon® Phi processors. Use this package to evaluate the performance of distributed-memory systems based on any generation of the Intel® Xeon® processor E3, Intel® Xeon® processor E5, Intel® Xeon® processor E7, Intel® Xeon® Scalable processor family, and Intel Xeon PhiTM processor families.

The Intel® oneAPI Math Kernel Library Inspector-executor Sparse BLAS kernels SpMV, TRSV, and SYMGS are implemented using an inspector-executor model. The inspection step chooses the best algorithm for the input matrix and converts the matrix to a special internal representation to achieve high performance at the execution step.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at

Notice revision #20201201