Distributed Nested Dissection algorithm for Intel® MKL Parallel Direct Sparse Solver for Cluster

Published: 07/15/2016  

Last Updated: 07/15/2016

By Gennady Fedorov

Introduction

Many problems in computational science which require solving a large system of linear equations involve a huge sparse matrix (containing on the order of 108-109 rows). Solving such matrices is possible only on a machine with distributed memory. Intel® Math Kernel Library (Intel® MKL) Parallel Direct Sparse Solver for Clusters provides the solution for such systems on machines with distributed memory using MPI. The current implementation of the Parallel Direct Sparse Solver for Clusters uses the nested dissection algorithm from the METIS package [shared memory programming (SMP) version] for graph partitioning. This requires a huge memory space on the single node that runs METIS, so it creates a bottleneck for performance and memory scalability. To resolve this issue a distributed version of the nested dissection algorithm is integrated into Intel MKL.

Product Overview

Starting from Intel MKL version 2017, Parallel Direct Sparse Solver for Clusters supports a new type of reordering. This reordering is based on a distributed nested dissection algorithm, so for this reordering the initial matrix is distributed between MPI processes and is not combined on one single node. Moreover, the nested dissection step and symbolic factorization are performed in parallel on all MPI processes. Each MPI process works with its own part of the matrix, which allows use of all available memory and computational resources on all nodes of the cluster. As result, the new reordering can handle larger matrices than the SMP version.

To use the new reordering for Intel MKL Parallel Direct Sparse Solver for Clusters:

  1. Set iparm[1] to 10 to enable the distributed nested dissection algorithm.

Note: This article uses C notation for array indices; the Fortran notation corresponding to iparm[1] is iparm(0).

  1. To distribute the initial matrix between MPI processes without intersections, store the number of the first and the last rows belonging to the corresponding current MPI process in iparm[40] and iparm[41].

Notes:

  • Currently, the distributed nested dissection algorithm is supported only for a number of MPI processes equal to a power of 2.
  • Only phases 11, 22, 33, and -1 are supported.
  • Matching, scaling, and iterative refinement are not supported.

Performance results

 

 

Contacts and references.

Gennady.Fedorov@intel.com

roman.anders@intel.com

anton.anders@intel.com

alexander.a.kalinkin@intel.com

Intel® Math Kernel Library

Optimization Notice in English

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.