cblas_?gemm3m

Developer Reference for Intel® oneAPI Math Kernel Library for C

Download PDF

ID 766684

Date 12/16/2022

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

cblas_?gemm3m_batch

Computes scalar-matrix-matrix products and adds the results to scalar matrix products for groups of general matrices.

Syntax

void cblas_cgemm3m_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE* transa_array, const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const MKL_INT* n_array, const MKL_INT* k_array, const void *alpha_array, const void **a_array, const MKL_INT* lda_array, const void **b_array, const MKL_INT* ldb_array, const void *beta_array, void **c_array, const MKL_INT* ldc_array, const MKL_INT group_count, const MKL_INT* group_size);

void cblas_zgemm3m_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE* transa_array, const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const MKL_INT* n_array, const MKL_INT* k_array, const void *alpha_array, const void **a_array, const MKL_INT* lda_array, const void **b_array, const MKL_INT* ldb_array, const void *beta_array, void **c_array, const MKL_INT* ldc_array, const MKL_INT group_count, const MKL_INT* group_size);

Include Files

mkl.h

Description

The ?gemm3m_batch routines perform a series of matrix-matrix operations with general matrices. They are similar to the ?gemm3m routine counterparts, but the ?gemm3m_batch routines perform matrix-matrix operations with groups of matrices, processing a number of groups at once. The groups contain matrices with the same parameters. The ?gemm3m_batch routines use fewer matrix multiplications than the ?gemm_batch routines, as described in the Application Notes.

The operation is defined as

idx = 0
for i = 0..group_count - 1
     alpha and beta in alpha_array[i] and beta_array[i]
     for j = 0..group_size[i] - 1 
          A, B, and C matrix in a_array[idx], b_array[idx], and c_array[idx]
          C := alpha*op(A)*op(B) + beta*C,
          idx = idx + 1
     end for
 end for

where:

op(X) is one of op(X) = X, or op(X) = X^T, or op(X) = X^H,

alpha and beta are scalar elements of alpha_array and beta_array,

A, B and C are matrices such that for m, n, and k which are elements of m_array, n_array, and k_array:

op(A) is an m-by-k matrix,

op(B) is a k-by-n matrix,

C is an m-by-n matrix.

A, B, and C represent matrices stored at addresses pointed to by a_array, b_array, and c_array, respectively. The number of entries in a_array, b_array, and c_array is total_batch_count = the sum of all the group_size entries.

See also gemm for a detailed description of multiplication for general matrices and gemm_batch, BLAS-like extension routines for similar matrix-matrix operations.

NOTE:

Error checking is not performed for Intel® oneAPI Math Kernel Library Windows* single dynamic libraries for the?gemm3m_batch routines.

Input Parameters

Layout

Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major (CblasColMajor).

transa_array

Array of size group_count. For the group i, transa_i = transa_array[i] specifies the form of op(A) used in the matrix multiplication:

if transa_i = CblasNoTrans, then op(A) = A;

if transa_i = CblasTrans, then op(A) = A^T;

if transa_i = CblasConjTrans, then op(A) = A^H.

transb_array

Array of size group_count. For the group i, transb_i = transb_array[i] specifies the form of op(B_i) used in the matrix multiplication:

if transb_i = CblasNoTrans, then op(B) = B;

if transb_i = CblasTrans, then op(B) = B^T;

if transb_i = CblasConjTrans, then op(B) = B^H.

m_array

Array of size group_count. For the group i, m_i = m_array[i] specifies the number of rows of the matrix op(A) and of the matrix C.

The value of each element of m_array must be at least zero.

n_array

Array of size group_count. For the group i, n_i = n_array[i] specifies the number of columns of the matrix op(B) and the number of columns of the matrix C.

The value of each element of n_array must be at least zero.

k_array

Array of size group_count. For the group i, k_i = k_array[i] specifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B).

The value of each element of k_array must be at least zero.

alpha_array

Array of size group_count. For the group i, alpha_array[i] specifies the scalar alpha_i.

a_array

Array, size total_batch_count, of pointers to arrays used to store A matrices.

lda_array

Array of size group_count. For the group i, lda_i = lda_array[i] specifies the leading dimension of the array storing matrix A as declared in the calling (sub)program.

	transa_i=CblasNoTrans	transa_i=CblasTrans or transa_i=CblasConjTrans
Layout = CblasColMajor	lda_i must be at least `max(1, m_i)`.	lda_i must be at least `max(1, k_i)`
Layout = CblasRowMajor	lda_i must be at least `max(1, k_i)`	lda_i must be at least `max(1, m_i)`.

b_array

Array, size total_batch_count, of pointers to arrays used to store B matrices.

ldb_array

Array of size group_count. For the group i, ldb_i = ldb_array[i] specifies the leading dimension of the array storing matrix B as declared in the calling (sub)program.

	transb_i=CblasNoTrans	transb_i=CblasTrans or transb_i=CblasConjTrans
Layout = CblasColMajor	ldb_i must be at least `max(1, k_i)`.	ldb_i must be at least `max(1, n_i)`.
Layout = CblasRowMajor	ldb_i must be at least `max(1, n_i)`.	ldb_i must be at least `max(1, k_i)`.

beta_array

For the group i, beta_array[i] specifies the scalar beta_i.

When beta_i is equal to zero, then C matrices in group i need not be set on input.

c_array

Array, size total_batch_count, of pointers to arrays used to store C matrices.

ldc_array

Array of size group_count. For the group i, ldc_i = ldc_array[i] specifies the leading dimension of all arrays storing matrix C in group i as declared in the calling (sub)program.

When Layout = CblasColMajorldc_i must be at least max(1, m_i).

When Layout = CblasRowMajorldc_i must be at least max(1, n_i).

group_count

Specifies the number of groups. Must be at least 0.

group_size

Array of size group_count. The element group_size[i] specifies the number of matrices in group i. Each element in group_size must be at least 0.

Output Parameters

c_array: Overwritten by the m_i-by-n_i matrix (alpha_i*op(A)*op(B) + beta_i*C) for group i.

Application Notes

These routines perform a complex matrix multiplication by forming the real and imaginary parts of the input matrices. This uses three real matrix multiplications and five real matrix additions instead of the conventional four real matrix multiplications and two real matrix additions. The use of three real matrix multiplications reduces the time spent in matrix operations by 25%, resulting in significant savings in compute time for large matrices.

If the errors in the floating point calculations satisfy the following conditions:

fl(x op y)=(x op y)(1+δ),|δ|≤u, op=×,/, fl(x±y)=x(1+α)±y(1+β), |α|,|β|≤u

then for an n-by-n matrix Ĉ=fl(C₁+iC₂)= fl((A₁+iA₂)(B₁+iB₂))=Ĉ₁+iĈ₂, the following bounds are satisfied:

║Ĉ₁-C₁║≤ 2(n+1)u║A║_∞║B║_∞+O(u²),

║Ĉ₂-C₂║≤ 4(n+4)u║A║_∞║B║_∞+O(u²),

where ║A║_∞=max(║A₁║_∞,║A₂║_∞), and ║B║_∞=max(║B₁║_∞,║B₂║_∞).

Thus the corresponding matrix multiplications are stable.

Parent topic: BLAS-like Extensions

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Developer Reference for Intel® oneAPI Math Kernel Library for C

cblas_?gemm3m_batch