Developer Reference for Intel® oneAPI Math Kernel Library for Fortran

ID 766686
Date 12/16/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Two-stage Algorithm in Inspector-Executor Sparse BLAS Routines

You can use a two-stage algorithm in Inspector-executor Sparse BLAS routines which produce a sparse matrix. The applicable routines are:

The two-stage algorithm allows you to split computations into stages. The main purpose of the splitting is to provide an estimate for the memory required for the output prior to allocating the largest part of the memory (for the indices and values of the non-zero elements). Additionally, the two-stage approach extends the functionality and allows more complex usage models.

NOTE:
The multistage approach currently does not allow you to allocate memory for the output matrix outside oneMKL.

In the two-stage algorithm:

  1. The first stage allocates data which is necessary for the memory estimation (arrays rows_start/rows_end or cols_start/cols_end depending on the format, (see Sparse Matrix Storage Formats) and computes the number of entries or the full structure of the matrix.
    NOTE:
    The format of the output is decided internally but can be checked using the export functionality mkl_sparse_?_export_<format>.
  2. The second stage allocates data and computes column or row indices (depending on the format) of non-zero elements and/or values of the output matrix.

Specifying the stage for execution is supported through the sparse_request_t parameter in the API with the following options:

Values for sparse_request_t parameter
Value

Description

SPARSE_STAGE_NNZ_COUNT

Allocates and computes only the rows_start/rows_end (CSR/BSR format) or cols_start/cols_end (CSC format) arrays for the output matrix. After this stage, by calling mkl_sparse_?_export_<format>, you can obtain the number of non-zeros in the output matrix and calculate the amount of memory required for the output matrix.

SPARSE_STAGE_FINALIZE_MULT_NO_VAL

Allocates and computes row/column indices provided that rows_start/rows_end or cols_start/cols_end have already been computed in a prior call with the request SPARSE_STAGE_NNZ_COUNT. The values of the output matrix are not computed.

SPARSE_STAGE_FINALIZE_MULT

Depending on the state of the output matrix C on entry to the routine, this stage does one of the following:

  • Allocates and computes row/column indices and values of nonzero elements, if only rows_start/rows_end or cols_start/cols_end are present
  • allocates and computes values of nonzero elements, if rows_start/rows_end or cols_start/cols_end and row/column indices of non-zero elements are present
SPARSE_STAGE_FULL_MULT_NO_VAL

Allocates and computes the output matrix structure in a single step. The values of the output matrix are not computed.

SPARSE_STAGE_FULL_MULT

Allocates and computes the entire output matrix (structure and values) in a single step.

The example below shows how you can use the two-stage approach for estimating the memory requirements for the output matrix in CSR format:

First stage (sparse_request_t = SPARSE_STAGE_NNZ_COUNT)

  1. The routine mkl_sparse_sp2m is called with the request parameter SPARSE_STAGE_NNZ_COUNT.
  2. The arrays rows_start and rows_end are exported using the mkl_sparse_x_export_csr routine.
  3. These arrays are used to calculate the number of non-zeros (nnz) of the resulting output matrix.

Note that by the end of the first stage, the arrays associated with column indices and values of the output matrix have not been allocated or computed yet.

sparse_matrix_t csrC = NULL;
status = mkl_sparse_sp2m (opA, descrA, csrA, opB, descrB, csrB, SPARSE_STAGE_NNZ_COUNT, &csrC);

/* optional calculation of nnz in the output matrix for getting a memory estimate */

status = mkl_sparse_?_export_csr (csrC, &indexing, &nrows, &ncols, &rows_start, &rows_end, &col_indx, &values);

MKL_INT nnz = rows_end[nrows-1] - rows_start[0];

Second stage (sparse_request_t = SPARSE_STAGE_FINALIZE_MULT)

This stage allocates and computes the remaining output arrays (associated with column indices and values of output matrix entries) and completes the matrix-matrix multiplication.

status = mkl_sparse_sp2m (opA, descrA, csrA, opB, descrB, csrB, SPARSE_STAGE_FINALIZE_MULT, &csrC);

When the two-stage approach is not needed, you can perform both stages in a single call:

Single stage operation (sparse_request_t = SPARSE_STAGE_FULL_MULT)

status = mkl_sparse_sp2m (opA, descrA, csrA, opB, descrB, csrB, SPARSE_STAGE_FULL_MULT, &csrC);