The current Intel® oneAPI Math Kernel Library (oneMKL) 2024 update 2 still haven't implement the function to compute Pseudoinverse (also known as general inverse) of a matrix directly. For customer would like to implement pseudoinverse of a matrix by oneMKL for parallel computing, please consider to use other methodologies, for instance QR decomposition and Singular value decomposition (SVD). In this article, some computationally simple and accurate ways to compute the pseudo inverse by constructing decomposition algorithm have been discussed.
Singular value decomposition (SVD)
We will focus on real, column-major matrices in this article. If the singular value of m-by-n matrix A can be calculated like A=UΣVT, the pseudoinverse of matrix A+ must satisfy A+=VΣ-1UT. oneMKL has already provided SVD functions for dense and banded matrix (LAPACK/ ScaLapack). To learn more detail info of each function, please refer to the oneMKL developer reference. To calculate general inverse by oneMKL SVD function, please refer following process:
- Use LAPACK SVD function to calculate orthogonal for matrix A (returns u with first min(m,n) columns of U and vt with first min(m,n) rows of VT) , and an array s with the length of min(m,n) contains diagonal elements of matrix Σ.
/* Executable statements */ printf( " DGESVD Example Program Results\n" ); /* Query and allocate the optimal workspace */ lwork = -1; dgesvd( "S", "S", &m, &n, a, &lda, s, u, &ldu, vt, &ldvt, &wkopt, &lwork, &info ); lwork = (MKL_INT)wkopt; work = (double*)malloc( lwork*sizeof(double) ); /* Compute SVD */ dgesvd( "S", "S", &m, &n, a, &lda, s, u, &ldu, vt, &ldvt, work, &lwork, &info ); /* Check for convergence */ if( info > 0 ) { printf( "The algorithm computing SVD failed to converge.\n" ); exit( 1 ); }
- The second step is to calculate u=UΣ-1, consider the Σ is diagonal matrix, thus the process can be implemented by a loop of calling BLAS ?scal function for computing product of a vector by a scalar. And the scalar for each column of u would be inverse of each non-zero element of s. The code is like below.
int incx=1; #pragma omp parallel for for(int i=0; i<k; i++) { double ss; if(s[i] > 1.0e-9) ss=1.0/s[i]; else ss=s[i]; dscal(&m, &ss, &u[i*m], &incx); }
- Then calculate A+=(VT)TuT, use oneMKL ?gemm function to calculate matrix multiplication. Here I use dgemm for an example:
double* inva = (double*)malloc(n*m*sizeof(double)); double alpha=1.0, beta=0.0; mkl_int ld_inva=n; dgemm( "T", "T", &n, &m, &k, &alpha, vt, &ldvt, u, &ldu, &beta, inva, &ld_inva);
Cluster Implementation
Considered oneMKL ScaLapack already provided cluster implementation, users could use p?gesvd funtion and p?gemm for solution. Since BLAS level 1 routines do not have cluster and multiple thread implementation, we recommend users to perform ?scal on parallel rank with MPI.