cblas_?gemm

Developer Reference for Intel® oneAPI Math Kernel Library for C

Download PDF

ID 766684

Date 7/13/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

cblas_?gemm_pack

Performs scaling and packing of the matrix into the previously allocated buffer.

Syntax

void cblas_hgemm_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const MKL_F16 alpha, const MKL_F16 *src, const MKL_INT ld, MKL_F16 *dest);

void cblas_sgemm_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float alpha, const float *src, const MKL_INT ld, float *dest);

void cblas_dgemm_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const double alpha, const double *src, const MKL_INT ld, double *dest);

Include Files

mkl.h

Description

The cblas_?gemm_pack routine is one of a set of related routines that enable use of an internal packed storage. Call cblas_?gemm_pack after you allocate a buffer whose size is given by cblas_?gemm_pack_getsize. The cblas_?gemm_pack routine scales the identified matrix by alpha and packs it into the buffer allocated previously.

NOTE:

Do not copy the packed matrix to a different address because the internal implementation depends on the alignment of internally-stored metadata.

The cblas_?gemm_pack routine performs this operation:

dest := alpha*op(src) as part of the computation C := alpha*op(A)*op(B) + beta*C

where:

op(X) is one of the operations op(X) = X, op(X) = X^T, or op(X) = X^H,
alpha and beta are scalars,
src is a matrix,
A , B, and C are matrices
op(src) is an m-by-k matrix if identifier = CblasAMatrix,
op(src) is a k-by-n matrix if identifier = CblasBMatrix,
dest is an internal packed storage buffer.

NOTE:

You must use the same value of the Layout parameter for the entire sequence of related cblas_?gemm_pack and cblas_?gemm_compute calls.

For best performance, use the same number of threads for packing and for computing.

If packing for both A and B matrices, you must use the same number of threads for packing A as for packing B.

Input Parameters

Layout

Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major (CblasColMajor).

identifier

Specifies which matrix is to be packed:

If identifier = CblasAMatrix, the routine allocates storage to pack matrix A.

If identifier = CblasBMatrix, the routine allocates storage to pack matrix B.

trans

Specifies the form of op(src) used in the packing:

If trans = CblasNoTrans op(src) = src.

If trans = CblasTrans op(src) = src^T.

If trans = CblasConjTrans op(src) = src^H.

m

Specifies the number of rows of the matrix op(A) and of the matrix C. The value of m must be at least zero.

n

Specifies the number of columns of the matrix op(B) and the number of columns of the matrix C. The value of n must be at least zero.

k

Specifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B). The value of k must be at least zero.

alpha

Specifies the scalar alpha.

src

Array:

	identifier = CblasAMatrix	identifier = CblasBMatrix
Layout = CblasColMajor	Size `ld*k`. Before entry, the leading m-by-k part of the array src must contain the matrix `A`.	Size `ld*m`. Before entry, the leading k-by-m part of the array src must contain the matrix `A`.	Size `ld*n`. Before entry, the leading k-by-n part of the array src must contain the matrix `B`.	Size `ld*k`. Before entry, the leading n-by-k part of the array src must contain the matrix `B`.
Layout = CblasRowMajor	Size `ld*m`. Before entry, the leading k-by-m part of the array src must contain the matrix `A`.	Size `ld*k`. Before entry, the leading m-by-k part of the array src must contain the matrix `A`.	Size `ld*k`. Before entry, the leading n-by-k part of the array src must contain the matrix `B`.	Size `ld*n`. Before entry, the leading k-by-n part of the array src must contain the matrix `B`.

identifier = CblasAMatrix

identifier = CblasBMatrix

trans = CblasNoTrans

trans = CblasTrans or trans = CblasConjTrans

trans = CblasNoTrans

trans = CblasTrans or trans = CblasConjTrans

Layout = CblasColMajor

Size ld*k.

Before entry, the leading m-by-k part of the array src must contain the matrix A.

Size ld*m.

Before entry, the leading k-by-m part of the array src must contain the matrix A.

Size ld*n.

Before entry, the leading k-by-n part of the array src must contain the matrix B.

Size ld*k.

Before entry, the leading n-by-k part of the array src must contain the matrix B.

Layout = CblasRowMajor

Size ld*m.

Before entry, the leading k-by-m part of the array src must contain the matrix A.

Size ld*k.

Before entry, the leading m-by-k part of the array src must contain the matrix A.

Size ld*k.

Before entry, the leading n-by-k part of the array src must contain the matrix B.

Size ld*n.

Before entry, the leading k-by-n part of the array src must contain the matrix B.

ld

Specifies the leading dimension of src as declared in the calling (sub)program.

	identifier = CblasAMatrix		identifier = CblasBMatrix
	trans = CblasNoTrans	trans = CblasTrans or trans = CblasConjTrans	trans = CblasNoTrans	trans = CblasTrans or trans = CblasConjTrans
Layout = CblasColMajor	ld must be at least `max(1, m)`.	ld must be at least `max(1, k)`.	ld must be at least `max(1, k)`.	ld must be at least `max(1, n)`.
Layout = CblasRowMajor	ld must be at least `max(1, k)`.	ld must be at least `max(1, m)`.	ld must be at least `max(1, n)`.	ld must be at least `max(1, k)`.

dest

Scaled and packed internal storage buffer.

Output Parameters

dest	Overwritten by the matrix `alpha*op(src)`.

Parent topic: BLAS-like Extensions

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Developer Reference for Intel® oneAPI Math Kernel Library for C

cblas_?gemm_pack

Syntax

Include Files

Description

Input Parameters

Output Parameters

See Also