Developer Reference for Intel® oneAPI Math Kernel Library for C

ID 766684
Date 3/31/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Graph Operations

The graph API provides optimized kernels for the following computationally intense routines:

Routine Description

mkl_graph_mxv

Compute a (masked) matrix-vector product

mkl_graph_vxm

Compute a (masked) vector-matrix product

mkl_graph_mxm

Compute a (masked) matrix-matrix product

mkl_graph_transpose

Compute a (masked) transpose of a matrix

Graph operations (except mkl_graph_transpose) support the following modes:

  • Single-stage mode. Single-stage execution computes the output object in a single call to a graph operation with an appropriate value for the parameter of type mkl_graph_request_t. See Graph API Glossary for a list of all possible options.

    If the output object is sparse and the size of the corresponding arrays is likely not known in advance, the memory for the output object will be allocated inside the graph operation and can be deallocated only by calling an appropriate mkl_graph_<object>_destroy routine. To allocate all memory for the output on the user’s side, use multistage execution instead.

  • Multistage mode. Multistage execution constructs the output object over several calls to a graph operation, with each call requesting a specific stage. Unlike the single-stage mode, multistage execution allows you to allocate all memory for the output object. Only temporary memory will be allocated internally inside the graph routine. You must pass pointers to the allocations by calling an mkl_graph_<object>_set_<format> routine before each stage. These calls also specify the format of the final output object. The stage is specified through the parameter of type mkl_graph_request_t. See Graph API Glossary for a list of all possible options.

For choosing the best (performance-wise) format for the output, you can specify a method to be used for computations with an appropriate value for the parameter of type mkl_graph_method_t. For each graph operation which supports it, a desirable output format is described for a given configuration of input arguments. If you specify a format which is not considered to be the best inside the graph operation, your specified format will still be used internally.

As an example, consider computing a non-masked matrix-matrix product using mkl_graph_mxm in the multistage mode. Assume also that you want the output in CSR format (which is a preferred choice if both input matrices are also in CSR and the Gustavson algorithm is set for the method). Then you can have the following workflow shown in pseudo-code:

// Prepare the input matrices A and B.
// Create an empty matrix object for the output.
    mkl_graph_matrix_create(&C)
// Allocate a rows_start buffer of chosen type for the output.
// Set the user-allocated rows_start in the output matrix object.
    mkl_graph_matrix_set_csr(C, nrows, ncols, rows_start, rows_start_type, NULL, …)
// Fill rows_start for the output.
    mkl_graph_mxm(C,…, A, B, …, MKL_GRAPH_REQUEST_FILL_NNZ, …)       
// Use rows_start to deduce the number of nonzero entries nnz.
// Allocate buffers for the column indices and values to hold nnz entries of the desired
//  types.
// Set the allocated buffers for column indices and values in the output matrix object. 
    mkl_graph_matrix_set_csr(C, …, col_indx, col_indx_type, values, values_type)
// Fill buffers col_indx and values with calculated column indices and values 
    mkl_graph_mxm(C, …, A, B, …, MKL_GRAPH_REQUEST_FILL_ENTRIES, …) 

For full working code using multistage mode, refer to graphc_mxm_multistage.c in the examples for graph functionality.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Notice revision #20201201