Visible to Intel only — GUID: GUID-CC62EC5B-F57F-4381-8781-16BCB783FE31
Visible to Intel only — GUID: GUID-CC62EC5B-F57F-4381-8781-16BCB783FE31
cblas_?gemm_pack
Performs scaling and packing of the matrix into the previously allocated buffer.
Syntax
void cblas_hgemm_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const MKL_F16 alpha, const MKL_F16 *src, const MKL_INT ld, MKL_F16 *dest);
void cblas_sgemm_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float alpha, const float *src, const MKL_INT ld, float *dest);
void cblas_dgemm_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const double alpha, const double *src, const MKL_INT ld, double *dest);
Include Files
- mkl.h
Description
The cblas_?gemm_pack routine is one of a set of related routines that enable use of an internal packed storage. Call cblas_?gemm_pack after you allocate a buffer whose size is given by cblas_?gemm_pack_getsize. The cblas_?gemm_pack routine scales the identified matrix by alpha and packs it into the buffer allocated previously.
Do not copy the packed matrix to a different address because the internal implementation depends on the alignment of internally-stored metadata.
The cblas_?gemm_pack routine performs this operation:
dest := alpha*op(src) as part of the computation C := alpha*op(A)*op(B) + beta*C
where:
- op(X) is one of the operations op(X) = X, op(X) = XT, or op(X) = XH,
- alpha and beta are scalars,
- src is a matrix,
- A , B, and C are matrices
- op(src) is an m-by-k matrix if identifier = CblasAMatrix,
- op(src) is a k-by-n matrix if identifier = CblasBMatrix,
- dest is an internal packed storage buffer.
You must use the same value of the Layout parameter for the entire sequence of related cblas_?gemm_pack and cblas_?gemm_compute calls.
For best performance, use the same number of threads for packing and for computing.
If packing for both A and B matrices, you must use the same number of threads for packing A as for packing B.
Input Parameters
- Layout
-
Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major (CblasColMajor).
- identifier
-
Specifies which matrix is to be packed:
If identifier = CblasAMatrix, the routine allocates storage to pack matrix A.
If identifier = CblasBMatrix, the routine allocates storage to pack matrix B.
- trans
-
Specifies the form of op(src) used in the packing:
If trans = CblasNoTrans op(src) = src.
If trans = CblasTrans op(src) = srcT.
If trans = CblasConjTrans op(src) = srcH.
- m
-
Specifies the number of rows of the matrix op(A) and of the matrix C. The value of m must be at least zero.
- n
-
Specifies the number of columns of the matrix op(B) and the number of columns of the matrix C. The value of n must be at least zero.
- k
-
Specifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B). The value of k must be at least zero.
- alpha
-
Specifies the scalar alpha.
- src
-
Array:
identifier = CblasAMatrix
identifier = CblasBMatrix
trans = CblasNoTrans
trans = CblasTrans or trans = CblasConjTrans
trans = CblasNoTrans
trans = CblasTrans or trans = CblasConjTrans
Layout = CblasColMajor
Size ld*k.
Before entry, the leading m-by-k part of the array src must contain the matrix A.
Size ld*m.
Before entry, the leading k-by-m part of the array src must contain the matrix A.
Size ld*n.
Before entry, the leading k-by-n part of the array src must contain the matrix B.
Size ld*k.
Before entry, the leading n-by-k part of the array src must contain the matrix B.
Layout = CblasRowMajor
Size ld*m.
Before entry, the leading k-by-m part of the array src must contain the matrix A.
Size ld*k.
Before entry, the leading m-by-k part of the array src must contain the matrix A.
Size ld*k.
Before entry, the leading n-by-k part of the array src must contain the matrix B.
Size ld*n.
Before entry, the leading k-by-n part of the array src must contain the matrix B.
- ld
-
Specifies the leading dimension of src as declared in the calling (sub)program.
identifier = CblasAMatrix
identifier = CblasBMatrix
trans = CblasNoTrans
trans = CblasTrans or trans = CblasConjTrans
trans = CblasNoTrans
trans = CblasTrans or trans = CblasConjTrans
Layout = CblasColMajor
ld must be at least max(1, m).
ld must be at least max(1, k).
ld must be at least max(1, k).
ld must be at least max(1, n).
Layout = CblasRowMajor
ld must be at least max(1, k).
ld must be at least max(1, m).
ld must be at least max(1, n).
ld must be at least max(1, k).
- dest
-
Scaled and packed internal storage buffer.
Output Parameters
dest |
Overwritten by the matrix alpha*op(src). |