omatadd_batch
Computes a group of out-of-place scaled matrix additions using general
matrices.
Description
The
omatadd_batch
routines perform a series of out-of-place scaled
matrix additions. They are similar to the omatadd
routines, but
the omatadd_batch
routines perform matrix operations with a group
of matrices.The matrices are always in a strided format for this API. The
operation is defined as:
for i = 0 … batch_size – 1
A is a matrix at offset i * stridea in a
B is a matrix at offset i * strideb in b
C is a matrix at offset i * stridec in c
C = alpha * op(A) + beta * op(B)
end for
where:
- op(X)is one ofop(X) = X,op(X) = X', orop(X) = conjg(X')
- alphaandbetaare scalars
- A, B and C are matrices
The API is available with USM pointers or buffer arguments for the
input and output arrays.
The input buffers or arrays
a
and b
contain all the input
matrices, and the single output buffer or array c
contains all the
output matrices. The locations of the individual matrices within the
buffer or array are given by stride lengths, while the number of
matrices is given by the batch_size
parameter.API
Syntax
USM arrays:
event omatadd_batch(queue &queue,
transpose transa,
transpose transb,
std::int64_t m,
std::int64_t n,
T alpha,
const T *a,
std::int64_t lda,
std::int64_t stride_a,
T beta,
T *b,
std::int64_t ldb,
std::int64_t stride_b,
T *c,
std::int64_t ldc,
std::int64_t stride_c,
std::int64_t batch_size,
const std::vector<event> &dependencies = {});
Buffer arrays:
void omatadd_batch(queue &queue, transpose transa,
transpose transb,
std::int64_t m, std::int64_t n,
T alpha, cl::sycl::buffer<T, 1> &a,
std::int64_t lda, std::int64_t stride_a,
T beta, cl::sycl::buffer<T, 1> &b,
std::int64_t ldb, std::int64_t stride_b,
cl::sycl::buffer<T, 1> &c, std::int64_t ldc,
std::int64_t stride_c,
std::int64_t batch_size);
omatadd_batch
supports the following precisions and devices:T | Devices Supported |
---|---|
float | Host, CPU, and GPU |
double | Host, CPU, and GPU |
std::complex<float> | Host, CPU, and GPU |
std::complex<double> | Host, CPU, and GPU |
Input Parameters
- transa
- Specifiesop(A), the transposition operation applied to the matrices A.
- transb
- Specifiesop(B), the transposition operation applied to the matrices B.
- m
- Number of rows for the result matrix C. Must be at least zero.
- n
- Number of columns for the result matrix C. Must be at least zero.
- alpha
- Scaling factor for the matrices A.
- a
- Buffer or array holding the input matrices A. Must have size at leaststride_a*batch_size.
- lda
- Leading dimension of the A matrices. If matrices are stored using column major layout,ldamust be at leastmif A is not transposed ornif A is transposed. If matrices are stored using row major layout,ldamust be at leastnif A is not transposed or at leastmif A is transposed. Must be positive.
- stride_a
- Stride between the different A matrices. If matrices are stored using column major layout,stride_amust be at leastlda*nif A is not transposed or at leastlda*mif A is transposed. If matrices are stored using row major layout,stride_amust be at leastlda*mif B is not transposed or at leastlda*nif A is transposed.
- beta
- Scaling factor for the matrices B.
- b
- Buffer or array holding the input matrices B. Must have size at leaststride_b*batch_size.
- ldb
- Leading dimension of the B matrices. If matrices are stored using column major layout,ldbmust be at leastmif B is not transposed ornif B is transposed. If matrices are stored using row major layout,ldbmust be at leastnif B is not transposed or at leastmif B is transposed. Must be positive.
- stride_b
- Stride between the different B matrices. If matrices are stored using column major layout,stride_bmust be at leastldb*nif B is not transposed or at leastldb*mif B is transposed. If matrices are stored using row major layout,stride_bmust be at leastldb*mif B is not transposed or at leastldb*nif B is transposed.
- ldc
- Leading dimension of the A matrices. If matrices are stored using column major layout,ldamust be at leastm. If matrices are stored using row major layout,ldamust be at leastn. Must be positive.
- stride_c
- Stride between the different C matrices. If matrices are stored using column major layout,stride_cmust be at leastldc*n. If matrices are stored using row major layout,stride_cmust be at leastldc*m.
- batch_size
- Specifies the number of input and output matrices to add.
- dependencies
- List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Output Parameters
- c
- Output buffer or array, overwritten bybatch_sizematrix addition operations of the formalpha*op(A) + beta*op(B). Must have size at leaststride_c*batch_size.