Developer Reference for Intel® oneAPI Math Kernel Library for Fortran
gemm_*_pack
Pack the matrix into the buffer allocated previously.
Syntax
call gemm_s8u8s32_pack (identifier, trans, m, n, k, src, ld, dest)
call gemm_s16s16s32_pack (identifier, trans, m, n, k, src, ld, dest)
void cblas_gemm_s8u8s32_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const void *src, const MKL_INT ld, void *dest); void cblas_gemm_s16s16s32_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const MKL_INT16 *src, const MKL_INT ld, MKL_INT16 *dest); void cblas_gemm_bf16bf16f32_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const MKL_BF16 *src, const MKL_INT ld, MKL_BF16 *dest); void cblas_gemm_f16f16f32_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const MKL_F16 *src, const MKL_INT ld, MKL_F16 *dest); void cblas_gemm_e5m2e5m2f32_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const MKL_E5M2 *src, const MKL_INT ld, MKL_E5M2 *dest); void cblas_gemm_e4m3e4m3f32_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const MKL_E4M3 *src, const MKL_INT ld, MKL_E4M3 *dest);
Include Files
- mkl.fi
Description
The gemm_*_pack routine is one of a set of related routines that enable the use of an internal packed storage. Call gemm_*_pack after you allocate a buffer whose size is given by gemm_*_pack_get_size. The gemm_*_pack routine packs the identified matrix into the buffer allocated previously.
The gemm_*_pack routine performs this operation:
dest := op(src) as part of the computation C := alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C + C_offset for integer types.
C := alpha*op(A) * op(B) + beta*C for bfloat16 type.
where:
- op(X) is one of the operations op(X) = X or op(X) = XT
- alpha and beta are scalars,
- src is a matrix,
- A , A_offset,B, B_offset,c,and C_offset are matrices
- op(src) is an m-by-k matrix if identifier = 'A' or 'a',
- op(src) is a k-by-n matrix if identifier = 'B' or 'b',
- dest is the buffer previously allocated to store the matrix packed into an internal format
- A_offset is an m-by-k matrix.
- B_offset is an k-by-n matrix.
- C_offset is an m-by-n matrix.
For best performance, use the same number of threads for packing and for computing.
If packing for both A and B matrices, you must use the same number of threads for packing A as for packing B.
Input Parameters
- identifier
-  CHARACTER*1. 
     Specifies which matrix is to be packed: If identifier = 'A' or 'a', the A matrix is packed. If identifier = 'B' or 'b', the B matrix is packed. 
- trans
-  CHARACTER*1. 
     Specifies the form of op(src) used in the packing: If trans = 'N' or 'n' op(src) = src. If trans = 'T' or 't' op(src) = srcT. 
- m
-  INTEGER. 
     Specifies the number of rows of matrix op(A) and of the matrix C. The value of m must be at least zero. 
- n
-  INTEGER. 
     Specifies the number of columns of matrix op(B) and the number of columns of matrix C. The value of n must be at least zero. 
- k
-  INTEGER. 
     Specifies the number of columns of matrix op(A) and the number of rows of matrix op(B). The value of k must be at least zero. 
- src
-  
     INTEGER*1 for gemm_s8u8s32_pack and INTEGER*2 for gemm_s16s16s32_pack trans = 'N' or 'n' trans = 'T' or 't' identifier = 'A' or 'a' Size ld*k. Before entry, the leading m-by-k part of the array src must contain the matrix A. Size ld*m. Before entry, the leading k-by-m part of the array src must contain the matrix A. identifier = 'B' or 'b' Size ld*n. Before entry, the leading k-by-n part of the array src must contain the matrix B. Size ld*k. Before entry, the leading n-by-k part of the array src must contain the matrix B. 
- ld
-  
     INTEGER. Specifies the leading dimension of src as declared in the calling (sub)program. trans = 'N' or 'n' trans = 'T' or 't' identifier = 'A' or 'a' ld must be at least max(1, m). ld must be at least max(1, k). identifier = 'B' or 'b' ld must be at least max(1, k). ld must be at least max(1, n). 
- dest
- 
      INTEGER*1 for gemm_s8u8s32_pack or INTEGER*2 for gemm_s16s16s32_pack 
     Buffer for the packed matrix. 
Output Parameters
| dest | INTEGER*1 for gemm_s8u8s32_pack or INTEGER*2 for gemm_s16s16s32_pack Overwritten by the matrix op(src)stored in a format internal to Intel® oneAPI Math Kernel Library (oneMKL). | 
Examples
See the following examples in the MKL installation directory to understand the use of these routines:
gemm_s8u8s32_pack: examples\blas\source\gemm_s8u8s32_computex.f
gemm_s16s16s32_pack: examples\blas\source\gemm_s16s16s32_computex.f