Implement Pseudoinverse of a matrix by Intel® MKL

ID 659578
Updated 5/14/2017
Version Latest
Public

author-image

By

The current Intel® MKL 2017 update 3 still haven't implement the function to compute Pseudoinverse (also known as general inverse) of a matrix directly. For customer would like to implement pseudoinverse of a matrix by Intel® MKL for parallel computing, please consider to use other methodologies, for instance QR decomposition and Singular value decomposition (SVD). In this article, some computationally simple and accurate ways to compute the pseudo inverse by constructing decomposition algorithm have been discussed.

Singular value decomposition (SVD)

If the singular value of m-by-n matrix A can be calculated like A=UΣV*, the pseudoinverse of matrix A+ must satisfy A+=VΣ-1U* =(V*)T-1U)T. Please note, the formula should be like A+= (V*)T(UΣ-1)T for column major processing. Intel® MKL already provide SVD function for dense and banded matrix (LAPACK/ ScaLapack), learn more detail info of each function, please refer Intel® MKL developer reference. To calculate general inverse by MKL SVD function, please refer following process:

  1. Use LAPACK SVD function to calculate orthogonal for matrix A (returns u with first min(m,n) columns of U and vt with first min(m,n) rows of V*) , and an array with the length of min(m,n) contains diagonal elements of matrix Σ.
    /* Executable statements */
    printf( " DGESVD Example Program Results\n" );
    /* Query and allocate the optimal workspace */
    lwork = -1;
    dgesvd( "S", "S", &m, &n, a, &lda, s, u, &ldu, vt, &ldvt, &wkopt, &lwork, &info );
    lwork = (MKL_INT)wkopt;
    work = (double*)malloc( lwork*sizeof(double) );
    /* Compute SVD */
    dgesvd( "S", "S", &m, &n, a, &lda, s, u, &ldu, vt, &ldvt, work, &lwork, &info );
    /* Check for convergence */
    if( info > 0 ) {
    	printf( "The algorithm computing SVD failed to converge.\n" );
    	exit( 1 );
    }
  2. The second step is to calculate u=Σ-1U, consider the Σ is diagonal matrix, thus the process can be implemented by a loop of calling BLAS ?scal function for computing product of a vector by a scalar. And the scalar for each column of u would be inverse of each non-zero element of s. The code is like below. Please note, the u is stored column-wise, and vt is row-wise here, thus the formula should be like u=UΣ-1
    int incx=1;
    #pragma omp parallel for
    for(int i=0; i<k; i++)
    {
    	double ss;
    	if(s[i] > 1.0e-9)
    		ss=1.0/s[i];
    	else
    		ss=s[i];
    	dscal(&m, &ss, &u[i*m], &incx);
    }
  3. Then calculate A+=(V*)TuT, use MKL ?GEMM function to calculate matrix multiplication. Here I use dgemm for an example:
    double* inva = (double*)malloc(n*m*sizeof(double));
    double alpha=1.0, beta=0.0;
    mkl_int ld_inva=n;
    dgemm( "T", "T", &n, &m, &k, &alpha, vt, &ldvt, u, &ldu, &beta, inva, &ld_inva);

We provided sample code "pseudoinverse_lapack.c", please access for detail implementation.

Cluster Implementation

Considered MKL ScaLapack already provided cluster implementation, users could use p?gesvd funtion and p?gemm for solution. Since BLAS 1 level routines do not have cluster and multiple thread implementation, we recommend users to perform ?scal on parallel rank with MPI. 

Please access sample code file "pseudoinverse_cluster.c" for detail implementation.