Overview of the Intel Optimized HPCG
The Intel® Optimized High Performance Conjugate Gradient Benchmark (Intel® Optimized HPCG) provides an implementation of the HPCG benchmark (http://hpcg-benchmark.org) optimized for Intel® Xeon® processors and Intel® Xeon Phi™ processors with Intel® Advanced Vector Extensions (Intel® AVX), Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® Advanced Vector Extensions 512 (Intel® AVX-512) support. The HPCG Benchmark is intended to complement the High Performance LINPACK benchmark used in the TOP500 (http://www.top500.org) system ranking by providing a metric that better aligns with a broader set of important cluster applications.
The HPCG benchmark implementation is based on a 3-dimensional (3D) regular 27-point discretization of an elliptic partial differential equation. The implementation calls a 3D domain to fill a 3D virtual process grid for all the available MPI ranks. HPCG uses the preconditioned conjugate gradient method (CG) to solve the intermediate systems of equations and incorporates a local and symmetric Gauss-Seidel preconditioning step that requires a triangular forward solve and a backward solve. A synthetic multi-grid V-cycle is used on each preconditioning step to make the benchmark better fit real-world applications. HPCG implements matrix multiplication locally, with an initial halo exchange between neighboring processes. The benchmark exhibits irregular accesses to memory and fine-grain recursive computations that dominate many scientific workloads.
The Intel® Optimized HPCG contains source code of the HPCG v3.0 reference implementation with necessary modifications to include:
Intel® architecture optimizations
Prebuilt benchmark executables that link to Intel® oneAPI Math Kernel Library (oneMKL)
- Inspector-executor Sparse BLAS kernels for sparse matrix-vector multiplication (SpMV)
Sparse triangular solve (TRSV)
Symmetric Gauss-Seidel smoother (SYMGS)
The Intel® oneAPI Math Kernel Library Inspector-executor Sparse BLAS kernels SpMV, TRSV, and SYMGS are implemented using an inspector-executor model. The inspection step chooses the best algorithm for the input matrix and converts the matrix to a special internal representation to achieve high performance at the execution step.
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.
Notice revision #20201201