Overview of Intel® oneAPI Math Kernel Library (oneMKL) SYCL Sparse BLAS
The following pages describe the oneMKL Sparse BLAS computational routines for SYCL. These routines, along with other helper routines (see Sparse BLAS Functionality for the full list), are declared in the header file
oneapi/mkl/spblas.hpp.
Several conventions are used throughout this document:
All oneMKL SYCL data types and non-domain-specific functions are inside the oneapi::mkl:: namespace.
All oneMKL SYCL Sparse BLAS functions are inside the oneapi::mkl::sparse namespace.
For brevity, the sycl namespace can be omitted from SYCL object types such as buffers and queues. For example, a single-precision, 1D buffer A could be written buffer<float,1> &A instead of sycl::buffer<float,1> &A.
Computational routines are overloaded on precision. Unless otherwise specified, all oneMKL Sparse BLAS computational routines support float, double, std::complex<float>, and std::complex<double> floating point types, and do not support mixed-precision computations yet.
oneMKL sparse BLAS domain currently does not offer bitwise-reproducibility (BWR) guarantees for most of its APIs.
For sparse matrix row and column indices, oneMKL Sparse BLAS supports std::int32_t and std::int64_t integer types for all supported matrix formats. Matrix handle creation routines are overloaded on integer types.
Some APIs require user-provided temporary workspaces. In case of sycl::buffer APIs, the temporary workspaces are of type sycl::buffer<std::uint8_t, 1> *, whereas in the case of USM APIs, they are of type void *.
For users of USM APIs, usage of oneMKL with all types of allocations (device, shared, and host) are supported; however, performance between them may differ. For maximum performance of Sparse BLAS APIs, we recommend using oneMKL with device memory allocations (sycl::malloc_device()) as much as possible except where specified otherwise, but explicit data movement associated with that is users’ responsibility.
Device Support
SYCL applications can target several types of devices:
CPU device: Performs computations on a CPU using OpenCL™.
GPU device: Performs computations on a GPU using Level Zero.
Each routine details the device types that are currently supported.
In the current release of oneMKL SYCL Sparse BLAS, all listed routines support use on CPU and GPU devices with the Compressed Sparse Row (CSR) matrix format unless otherwise noted.
Limited support with the Coordinate (COO), Compressed Sparse Column (CSC) and Block Compressed Sparse Row (BSR) matrix formats are also available, specified in the documentation of each API.
Sparse Format |
Setting data in oneapi::mkl::sparse::matrix_handle_t |
|---|---|
CSR |
|
CSC |
|
COO |
|
BSR |
SYCL Sparse BLAS operations using the matrix_handle_t are as follows:
Routine |
Supported Formats |
Description |
|---|---|---|
Level 2: |
||
CSR, CSC, COO, BSR |
General sparse matrix-dense vector product |
|
CSR, COO (CPU only) |
General sparse matrix-dense vector product with fused dot product |
|
CSR |
Symmetric sparse matrix-dense vector product |
|
CSR, COO (CPU only) |
Triangular sparse matrix-dense vector product |
|
CSR, COO (CPU only) |
Triangular solve of sparse matrix against a dense vector. |
|
Level 3: |
||
CSR, COO |
General sparse matrix-dense matrix product with dense matrix output |
|
CSR, COO (CPU only) |
Triangular solve of sparse matrix against a dense matrix. |
|
CSR |
General sparse matrix-sparse matrix addition with sparse matrix output. |
|
CSR |
General sparse matrix-sparse matrix product with sparse matrix output. |
|
CSR |
General sparse matrix-sparse matrix product with dense matrix output. |
|
Matrix Modifiers: |
||
CSR, COO |
Make an out-of-place copy of the data from one handle to another. |
|
CSR<->COO |
Copy and convert data in one matrix handle from its format to another handle in a separate format. |
|
CSR |
Apply sorting routine to matrix handle data according to native definition of sorting for each format. |
|
CSR |
Modifies main diagonal values in the matrix handle for the existing sparsity pattern according to provided diagonal value array. |