Visible to Intel only — GUID: GUID-964535E4-74AC-4612-81D3-11D6F5B89CBA
Visible to Intel only — GUID: GUID-964535E4-74AC-4612-81D3-11D6F5B89CBA
gemm_*_compute
Computes a matrix-matrix product with general integer matrices (where one or both input matrices are stored in a packed data structure) and adds the result to a scalar-matrix product.
call gemm_s8u8s32_compute (transa, transb, offsetc, m, n, k, alpha, a, lda, oa, b, ldb, ob, beta, c, ldc, oc)
call gemm_s16s16s32_compute (transa, transb, offsetc, m, n, k, alpha, a, lda, oa, b, ldb, ob, beta, c, ldc, oc)
- mkl.fi
The gemm_*_compute routine is one of a set of related routines that enable use of an internal packed storage. After calling gemm_*_pack call gemm_*_compute to compute
C := alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C + C_offset,
where:
- op(X) is either op(X) = X or op(X) = XT
- alpha and betaare scalars
- A , B, and C are matrices:
- op(A) is an m-by-k matrix,
- op(B) is a k-by-n matrix,
- C is an m-by-n matrix.
- A_offset is an m-by-k matrix with every element equal to the value oa.
- B_offset is an k-by-n matrix with every element equal to the value ob.
- C_offset is an m-by-n matrix defined by the oc array as described in the description of the offsetc parameter.
For best performance, use the same number of threads for packing and for computing.
If you are packing for both A and B matrices, you must use the same number of threads for packing A as for packing B.
transa |
CHARACTER*1. Specifies the form of op(A) used in the packing: If transa = 'N' or 'n' op(A) = A. If transa = 'T' or 't' op(A) = AT. If transa = 'P' or 'p' the matrix in array ais packed into a format internal to Intel® oneAPI Math Kernel Library andlda is ignored. |
||||||
transb |
CHARACTER*1. Specifies the form of op(B) used in the packing: If transb = 'N' or 'n' op(B) = B. If transb = 'T' or 't' op(B) = BT. If transb = 'P' or 'p' the matrix in array bis packed into a format internal to Intel® oneAPI Math Kernel Library andldb is ignored. |
||||||
offsetc |
CHARACTER*1. Specifies the form of C_offset used in the matrix multiplication. If offsetc='F' or 'f' :oc has a single element and every element of C_offset is equal to this element. If offsetc='C' or 'c' :oc has a size of m and every element of C_offset is equal to oc. If offsetc='R' or 'r' :oc has a size of n and every element of C_offset is equal to oc. |
||||||
m |
INTEGER. Specifies the number of rows of the matrix op(A) and of the matrix C. The value of m must be at least zero. |
||||||
n |
INTEGER. Specifies the number of columns of the matrix op(B) and the number of columns of the matrix C. The value of n must be at least zero. |
||||||
k |
INTEGER. Specifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B). The value of k must be at least zero. |
||||||
alpha |
REAL. Specifies the scalar alpha. |
||||||
a |
INTEGER*1 for gemm_s8u8s32_compute INTEGER*2 for gemm_s16s16s32_compute
|
||||||
lda |
INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. If transa = 'N' or 'n', lda must be at least max (1, m). If transa = 'T', 't', lda must be at least max (1, k). |
||||||
oa |
INTEGER*1 for gemm_s8u8s32_compute INTEGER*2 for gemm_s16s16s32_compute Specifies the scalar offset value for the matrix A. |
||||||
b |
INTEGER*1 for gemm_s8u8s32_compute INTEGER*2 for gemm_s16s16s32_compute
|
||||||
ldb |
INTEGER. Specifies the leading dimension of b as declared in the calling (sub)program. If transb = 'N' or 'n', ldb must be at least max (1, k). If transb = 'T', 't', 'C', or 'c', ldb must be at least max (1, n). |
||||||
ob |
INTEGER*1 for gemm_s8u8s32_compute INTEGER*2 for gemm_s16s16s32_compute Specifies the scalar offset value for the matrix B. |
||||||
beta |
REAL Specifies the scalar beta. |
||||||
c |
INTEGER*4 Array, size ldc by n. Before entry, the leading m-by-n part of the array c must contain the matrix C, except when beta is equal to zero, in which case c need not be set on entry. |
||||||
ldc |
INTEGER. Specifies the leading dimension of c as declared in the calling (sub)program. The value of ldc must be at least max (1, m). |
||||||
oc |
INTEGER*4 Array, size len. Specifies the scalar offset value for the matrix C. If offsetc = 'F' or 'f', len must be at least 1. If offsetc = 'C' or 'c', len must be at least max(1, m). If offsetc = 'R' or 'r', len must be at least max(1, n). |
c |
INTEGER*4 Overwritten by the matrix alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C + C_offset. |
See the following examples in the MKL installation directory to understand the use of these routines:
gemm_s8u8s32_compute: examples\blas\source\gemm_s8u8s32_computex.f
gemm_s16s16s32_compute: examples\blas\source\gemm_s16s16s32_computex.f
You can expand the matrix-matrix product in this manner:
(op(A) + A_offset)*(op(B) + B_offset) = op(A)*op(B) + op(A)*B_offset + A_offset*op(B) + A_offset*B_offset
After computing these four multiplication terms separately, they are summed from left to right. The results from the matrix-matrix product and the C matrix are scaled with alpha and beta floating-point values respectively using double-precision arithmetic. Before storing the results to the output c array, the floating-point values are rounded to the nearest integers.
In the event of overflow or underflow, the results depend on the architecture. The results are either unsaturated (wrapped) or saturated to maximum or minimum representable integer values for the data type of the output matrix.