imatcopy_batch
Computes a group of in-place scaled matrix transpose or copy operations
using general matrices.
Description
The
imatcopy_batch
routines perform a series of in-place scaled matrix
copies or transpositions. They are similar to the imatcopy
routines, but the imatcopy_batch
routines perform their operations with
groups of matrices. The groups contain matrices with the same parameters.The operation for the strided API is defined as:
for i = 0 … batch_size – 1
AB is a matrix at offset i * stride in ab_array
AB = alpha * op(AB)
end for
The operation for the group API is defined as:
idx = 0
for i = 0 … group_count – 1
m,n, alpha, lda, ldb and group_size at position i in their respective arrays
for j = 0 … group_size – 1
AB is a matrix at position idx in AB_array
AB = alpha * op(AB)
idx := idx + 1
end for
end for
where:
- op(X)is one ofop(X) = X,op(X) = X', orop(X) = conjg(X')
- alphais a scalar
- AB is a matrix to be transformed in place
The strided API is available with USM pointers or buffer arguments for the
input and output arrays, while the group API is available only with USM
pointers.
For the strided API, the single buffer or array AB contains all the matrices
to be transformed in place. The locations of the individual matrices within
the buffer or array are given by stride lengths, while the number of
matrices is given by the
batch_size
parameter.For the group API, the matrices are given by arrays of pointers. AB
represents a matrix stored at the address pointed to by
ab_array
.
The number of entries in ab_array
is total_batch_count
= the sum of
all the group_size
entries.API
Syntax
Strided API
USM arrays:
event imatcopy_batch(queue &queue,
transpose trans,
std::int64_t m,
std::int64_t n,
T alpha,
const T *ab,
std::int64_t lda,
std::int64_t ldb,
std::int64_t stride,
std::int64_t batch_size,
const vector_class<event> &dependencies = {});
Buffer arrays:
void imatcopy_batch(queue &queue, transpose trans,
std::int64_t m, std::int64_t n, T alpha,
cl::sycl::buffer<T, 1> &ab, std::int64_t lda,
std::int64_t ldb, std::int64_t stride,
std::int64_t batch_size);
Group API
event imatcopy_batch(queue &queue, const transpose *trans_array,
const std::int64_t *m_array,
const std::int64_t *n_array,
const T *alpha_array, T **ab_array,
const std::int64_t *lda_array,
const std::int64_t *ldb_array,
std::int64_t group_count,
const std::int64_t *groupsize,
const vector_class<event> &dependencies = {});
imatcopy_batch
supports the following precisions and devices:T | Devices Supported |
---|---|
float | Host, CPU, and GPU |
double | Host, CPU, and GPU |
std::complex<float> | Host, CPU, and GPU |
std::complex<double> | Host, CPU, and GPU |
Input Parameters
Strided API
- trans
- Specifiesop(AB), the transposition operation applied to the matrices AB.
- m
- Number of rows for each matrix AB on input. Must be at least 0.
- n
- Number of columns for each matrix AB on input. Must be at least 0.
- alpha
- Scaling factor for the matrix transpose or copy operation.
- ab
- Buffer holding the matrices AB. Must have size at leaststride*batch_size.
- lda
- Leading dimension of the AB matrices on input. If matrices are stored using column major layout,ldamust be at leastm. If matrices are stored using row major layout,ldamust be at leastn. Must be positive.
- ldb
- Leading dimension of the AB matrices on output. If matrices are stored using column major layout,ldbmust be at leastmif AB is not transposed ornif AB is transposed. If matrices are stored using row major layout,ldbmust be at leastnif AB is not transposed or at leastmif AB is transposed. Must be positive.
- stride
- Stride between the different AB matrices. It must be at leastmax(ldb,lda)*max(ka, kb), where:
- kaismif column major layout is used ornif row major
- layout is used
- kbisnif column major layout is used and AB is not
- transposed, ormotherwise
- batch_size
- Specifies the number of matrices to transpose or copy.
- dependencies
- List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Group API
- trans_array
- Array of sizegroup_count. Each elementiin the array specifiesop(AB)the transposition operation applied to the matrices AB.
- m_array
- Array of sizegroup_countof number of rows of AB on input. Each must be at least 0.
- n_array
- Array of sizegroup_countof number of columns of AB on input. Each must be at least 0.
- alpha_array
- Array of sizegroup_countcontaining scaling factors for the matrix transpositions or copies.
- ab_array
- Array of sizetotal_batch_count, holding pointers to arrays used to store AB matrices.
- lda_array
- Array of sizegroup_count. The leading dimension of the matrix input AB. If matrices are stored using column major layout,lda_array[i]must be at leastm_array[i]. If matrices are stored using row major layout,lda_array[i]must be at leastn_array[i]. Must be positive.
- ldb_array
- Array of sizegroup_count. The leading dimension of the output matrix AB. Each entryldb_array[i]must be positive and at least:
- m_array[i]if column major layout is used and AB is not transposed
- m_array[i]if row major layout is used and AB is transposed (AB’)
- n_array[i]otherwise
- group_count
- Number of groups. Must be at least 0.
- group_size
- Array of sizegroup_count. The elementgroup_size[i]is the number of matrices in the groupi. Each element ingroup_sizemust be at least 0.
- dependencies
- List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Output Parameters
Strided API
- ab
- Output buffer, overwritten bybatch_sizematrix multiply operations of the formalpha*op(AB).
Group API
- ab_array
- Output array of pointers to AB matrices, overwritten bytotal_batch_countmatrix transpose or copy operations of the formalpha*op(AB).