In this article a new export functionality introduced in the Intel® Math Kernel Library (Intel® MKL), version 2019 Update 4, for the Cluster Sparse Solver is discussed.
This functionality allows the user to export (distributed) LU factors, as well as the permutation matrices P and Q (and scaling matrix D, optionally) in CSR format. The output data will be distributed in the same way as the input matrix.
This functionality allows the user to use the Cluster Sparse Solver not only to find the solution to a linear system but also to compute the LU-like factorization, export the data for the decomposition and then work with the factors as with any other CSR matrix. For example, now the Cluster Sparse Solver can be used within an external LU decomposition functionality.
There are two main notions used below to describe the export functionality: operation name and data name (all currently available values for them can be found in mkl_cluster_sparse_solver.h and mkl_cluster_sparse_solver.f90):
- Operation name is the particular decomposition for which the data needs to be exported. For example, SPARSE_DPTLUQT is the operation name which corresponds to the decomposition A=D*P^T*L*U*Q^T, or, P*(D^-1 * A)*Q = L*U.
- Data name is the particular element of a decomposition. For example, SPARSE_PTLUQT_L corresponds to the factor L in the decomposition A=P^T*L*U*Q^T, or, P*A*Q = L*U. Notice that certain elements of the decomposition are naturally represented by CSR data (factors L and U) while others are better represented as non-CSR data (permutations P and Q and scaling D). Non CSR-data is currently exported as a vector of the corresponding type.
For the export functionality, the following four new functions have been introduced in Intel® MKL:
- cluster_sparse_solver_get_csr_size, which allows the user to get the number of rows and the number of nonzeros in the CSR data for a specified data name, e.g., SPARSE_PTLUQT_L.
- cluster_sparse_solver_set_ptr, which allows the user to provide a pointer to the data associated with a particular non-CSR element of the decomposition.
- cluster_sparse_solver_set_csr_ptrs, which allows the user to provide pointers to the CSR data associated with a particular element of the decomposition.
- cluster_sparse_solver_export, which actually fills the arrays for the corresponding export operation which the user has provided by means of cluster_sparse_solver_set_ptr and/or cluster_sparse_solver_set_csr_ptrs.
The export functionality for LU decomposition is meant to be used as described below:
- First, the user performs the factorization by means of a regular call to the cluster_sparse_solver (i.e., phase 2 should be completed, either as a call with phase = 12 or as separate calls with phase = 11 and 22).
- Second, the user calls cluster_sparse_solver_get_csr_size which returns the local number of rows and the number of nonzeros in the corresponding CSR data (e.g., for the factors L and U).
- Then, the user allocates memory for the CSR and non-CSR data required.
- Now, the user provides pointers to all the necessary elements of the decomposition by calling cluster_sparse_solver_set_ptr for the non-CSR data and cluster_sparse_solver_set_csr_ptrs for the CSR data.
- Finally, the user calls cluster_sparse_solver_export which exports the data, filling the memory regions as specified by the pointers provided in the previous step.
A typical example of using the export functionality is the following. For the sake of brevity, error checking is omitted in the code below, though each routine can return a meaningful error code (in the last argument). The snippet of C code provided below assumes that the Cluster Sparse Solver has been called successfully for phases 11 and 22 (or, equivalently, for a phase 12) for a valid input matrix provided in double precision.
/* 1. main call to the cluster sparse solver, factorization phase (assuming phase 11 has been called before) */
cluster_sparse_solver(pt, …, phase=22, nrows, …);
/* 2. Get local number of rows and nonzeros in L and U */
MKL_INT lnrows = 0, lnnz = 0;
cluster_sparse_solver_get_csr_size(pt, SPARSE_PTLUQT_L, &lnrows, &lnnz, comm, error);
MKL_INT unrows = 0, unnz = 0;
cluster_sparse_solver_get_csr_size(pt, SPARSE_PTLUQT_U, &unrows, &unnz, comm, error);
/* 3. Allocate arrays */
MKL_INT *Lrows = (MKL_INT*)mkl_malloc((lnrows+1)*sizeof(MKL_INT), 128);
MKL_INT *Lcols = (MKL_INT*)mkl_malloc(lnnz*sizeof(MKL_INT), 128);
double *Lvals = (double *)mkl_malloc(lnnz*sizeof(double ), 128);
MKL_INT *Urows = (MKL_INT*)mkl_malloc((unrows+1)*sizeof(MKL_INT), 128);
MKL_INT *Ucols = (MKL_INT*)mkl_malloc(unnz*sizeof(MKL_INT), 128);
double *Uvals = (double *)mkl_malloc(unnz*sizeof(double ), 128);
MKL_INT *Pperm = (MKL_INT*)mkl_malloc(n*sizeof(MKL_INT), 128);
MKL_INT *Qperm = (MKL_INT*)mkl_malloc(n*sizeof(MKL_INT), 128);
/* 4. Provide pointers to the CSR data for L and U and non-CSR data for P and Q. */
cluster_sparse_solver_set_csr_ptrs(pt, SPARSE_PTLUQT_L, Lrows, Lcols, Lvals, comm, error);
cluster_sparse_solver_set_csr_ptrs(pt, SPARSE_PTLUQT_U, Urows, Ucols, Uvals, comm, error);
cluster_sparse_solver_set_ptr(pt, SPARSE_PTLUQT_P, Pperm, comm, error);
cluster_sparse_solver_set_ptr(pt, SPARSE_PTLUQT_Q, Qperm, comm, error);
/* 5. Compute the output arrays as requested by the op:
LUPQ ~ [L,U,P,Q] = lu(A) */
cluster_sparse_solver_export(pt, SPARSE_PLUQ, comm, error);
/* Now (Lrows, Lcols and Lvals) and (Urows, Ucols and Uvals) store the distributed factors L and U in the CSR 3-array variation, P and Q store the permutation matrices as integer arrays. */
For a complete example please refer to the examples cl_solver_export_c.c and cl_solver_export_f90.f90 in the MKL examples directory.
In the current version of MKL, 2019 Update 4, only the following operation and data names are supported.
Operation names: SPARSE_PTLUQT, SPARSE_DPTLUQT.
Data names: SPARSE_PTLUQT_L, SPARSE_PTLUQT_U, SPARSE_PTLUQT_P, SPARSE_PTLUQT_Q, SPARSE_DPTLUQT_L, SPARSE_DPTLUQT_U, SPARSE_DPTLUQT_P, SPARSE_DPTLUQT_Q, SPARSE_DPTLUQT_D.
Some internal code paths are not supported for the export functionality. Among these are the cases when matching and scaling are turned on in the iparm parameters. When the provided iparm options are handled by an internal code path which is not supported, an error code -9 is returned by the export routines.
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.