Developer Guide


Overview of the Intel Optimized HPCG

The Intel® Optimized High Performance Conjugate Gradient Benchmark (Intel® Optimized HPCG) provides an implementation of the HPCG benchmark ( optimized for Intel® Xeon® processors and Intel® Xeon Phi™ processors with Intel® Advanced Vector Extensions (Intel® AVX), Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® Advanced Vector Extensions 512 (Intel® AVX-512) support. The HPCG Benchmark is intended to complement the High Performance LINPACK benchmark used in the TOP500 ( system ranking by providing a metric that better aligns with a broader set of important cluster applications.
The HPCG benchmark implementation is based on a 3-dimensional (3D) regular 27-point discretization of an elliptic partial differential equation. The implementation calls a 3D domain to fill a 3D virtual process grid for all the available MPI ranks. HPCG uses the preconditioned conjugate gradient method (CG) to solve the intermediate systems of equations and incorporates a local and symmetric Gauss-Seidel preconditioning step that requires a triangular forward solve and a backward solve. A synthetic multi-grid V-cycle is used on each preconditioning step to make the benchmark better fit real-world applications. HPCG implements matrix multiplication locally, with an initial halo exchange between neighboring processes. The benchmark exhibits irregular accesses to memory and fine-grain recursive computations that dominate many scientific workloads (for details, see
The Intel® Optimized HPCG contains source code of the HPCG v3.0 reference implementation with necessary modifications to include:
  • Intel® architecture optimizations
  • Prebuilt benchmark executables that link to
    • Inspector-executor Sparse BLAS kernels for sparse matrix-vector multiplication (SpMV)
    • Sparse triangular solve (TRSV)
    • Symmetric Gauss-Seidel smoother (SYMGS)
that are optimized for Intel AVX, Intel AVX2, and Intel AVX-512 instruction sets. For the Intel AVX-512 instruction set, there are separate versions that target Intel® Xeon® Scalable processors and Intel® Xeon® Phi processors. Use this package to evaluate the performance of distributed-memory systems based on any generation of the Intel® Xeon® processor E3, Intel® Xeon® processor E5, Intel® Xeon® processor E7, Intel® Xeon® Scalable processor family, and Intel Xeon Phi
processor families.
Intel® oneAPI
Math Kernel Library
Inspector-executor Sparse BLAS kernels SpMV, TRSV, and SYMGS are implemented using an inspector-executor model. The inspection step chooses the best algorithm for the input matrix and converts the matrix to a special internal representation to achieve high performance at the execution step.
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at
Notice revision #20201201

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at