Developer Reference

Contents

omatadd_batch

Computes a group of out-of-place scaled matrix additions using general matrices.

Description

The
omatadd_batch
routines perform a series of out-of-place scaled matrix additions. They are similar to the
omatadd
routines, but the
omatadd_batch
routines perform matrix operations with a group of matrices.
The matrices are always in a strided format for this API. The operation is defined as:
for i = 0 … batch_size – 1 A is a matrix at offset i * stridea in a B is a matrix at offset i * strideb in b C is a matrix at offset i * stridec in c C = alpha * op(A) + beta * op(B) end for
where:
  • op(X)
    is one of
    op(X) = X
    ,
    op(X) = X'
    , or
    op(X) = conjg(X')
  • alpha
    and
    beta
    are scalars
  • A, B and C are matrices
The API is available with USM pointers or buffer arguments for the input and output arrays.
The input buffers or arrays
a
and
b
contain all the input matrices, and the single output buffer or array
c
contains all the output matrices. The locations of the individual matrices within the buffer or array are given by stride lengths, while the number of matrices is given by the
batch_size
parameter.

API

Syntax
USM arrays:
event omatadd_batch(queue &queue, transpose transa, transpose transb, std::int64_t m, std::int64_t n, T alpha, const T *a, std::int64_t lda, std::int64_t stride_a, T beta, T *b, std::int64_t ldb, std::int64_t stride_b, T *c, std::int64_t ldc, std::int64_t stride_c, std::int64_t batch_size, const std::vector<event> &dependencies = {});
Buffer arrays:
void omatadd_batch(queue &queue, transpose transa, transpose transb, std::int64_t m, std::int64_t n, T alpha, cl::sycl::buffer<T, 1> &a, std::int64_t lda, std::int64_t stride_a, T beta, cl::sycl::buffer<T, 1> &b, std::int64_t ldb, std::int64_t stride_b, cl::sycl::buffer<T, 1> &c, std::int64_t ldc, std::int64_t stride_c, std::int64_t batch_size);
omatadd_batch
supports the following precisions and devices:
T
Devices Supported
float
Host, CPU, and GPU
double
Host, CPU, and GPU
std::complex<float>
Host, CPU, and GPU
std::complex<double>
Host, CPU, and GPU
Input Parameters
transa
Specifies
op(A)
, the transposition operation applied to the matrices A.
transb
Specifies
op(B)
, the transposition operation applied to the matrices B.
m
Number of rows for the result matrix C. Must be at least zero.
n
Number of columns for the result matrix C. Must be at least zero.
alpha
Scaling factor for the matrices A.
a
Buffer or array holding the input matrices A. Must have size at least
stride_a*batch_size
.
lda
Leading dimension of the A matrices. If matrices are stored using column major layout,
lda
must be at least
m
if A is not transposed or
n
if A is transposed. If matrices are stored using row major layout,
lda
must be at least
n
if A is not transposed or at least
m
if A is transposed. Must be positive.
stride_a
Stride between the different A matrices. If matrices are stored using column major layout,
stride_a
must be at least
lda*n
if A is not transposed or at least
lda*m
if A is transposed. If matrices are stored using row major layout,
stride_a
must be at least
lda*m
if B is not transposed or at least
lda*n
if A is transposed.
beta
Scaling factor for the matrices B.
b
Buffer or array holding the input matrices B. Must have size at least
stride_b*batch_size
.
ldb
Leading dimension of the B matrices. If matrices are stored using column major layout,
ldb
must be at least
m
if B is not transposed or
n
if B is transposed. If matrices are stored using row major layout,
ldb
must be at least
n
if B is not transposed or at least
m
if B is transposed. Must be positive.
stride_b
Stride between the different B matrices. If matrices are stored using column major layout,
stride_b
must be at least
ldb*n
if B is not transposed or at least
ldb*m
if B is transposed. If matrices are stored using row major layout,
stride_b
must be at least
ldb*m
if B is not transposed or at least
ldb*n
if B is transposed.
ldc
Leading dimension of the A matrices. If matrices are stored using column major layout,
lda
must be at least
m
. If matrices are stored using row major layout,
lda
must be at least
n
. Must be positive.
stride_c
Stride between the different C matrices. If matrices are stored using column major layout,
stride_c
must be at least
ldc*n
. If matrices are stored using row major layout,
stride_c
must be at least
ldc*m
.
batch_size
Specifies the number of input and output matrices to add.
dependencies
List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Output Parameters
c
Output buffer or array, overwritten by
batch_size
matrix addition operations of the form
alpha*op(A) + beta*op(B)
. Must have size at least
stride_c*batch_size
.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.