Developer Reference for Intel® oneAPI Math Kernel Library for Fortran

ID 766686
Date 3/31/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

gemm_*_compute

Computes a matrix-matrix product with general integer matrices (where one or both input matrices are stored in a packed data structure) and adds the result to a scalar-matrix product.

Syntax

call gemm_s8u8s32_compute (transa, transb, offsetc, m, n, k, alpha, a, lda, oa, b, ldb, ob, beta, c, ldc, oc)

call gemm_s16s16s32_compute (transa, transb, offsetc, m, n, k, alpha, a, lda, oa, b, ldb, ob, beta, c, ldc, oc)

Include Files
  • mkl.fi
Description

The gemm_*_compute routine is one of a set of related routines that enable use of an internal packed storage. After calling gemm_*_pack call gemm_*_compute to compute

C := alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C + C_offset,

where:

  • op(X) is either op(X) = X or op(X) = XT
  • alpha and betaare scalars
  • A , B, and C are matrices:
  • op(A) is an m-by-k matrix,
  • op(B) is a k-by-n matrix,
  • C is an m-by-n matrix.
  • A_offset is an m-by-k matrix with every element equal to the value oa.
  • B_offset is an k-by-n matrix with every element equal to the value ob.
  • C_offset is an m-by-n matrix defined by the oc array as described in the description of the offsetc parameter.

NOTE:

For best performance, use the same number of threads for packing and for computing.

If you are packing for both A and B matrices, you must use the same number of threads for packing A as for packing B.

Input Parameters

transa

CHARACTER*1. Specifies the form of op(A) used in the packing:

If transa = 'N' or 'n'  op(A) = A.

If transa = 'T' or 't'  op(A) = AT.

If transa = 'P' or 'p' the matrix in array ais packed into a format internal to Intel® oneAPI Math Kernel Library andlda is ignored.

transb

CHARACTER*1. Specifies the form of op(B) used in the packing:

If transb = 'N' or 'n'  op(B) = B.

If transb = 'T' or 't' op(B) = BT.

If transb = 'P' or 'p' the matrix in array bis packed into a format internal to Intel® oneAPI Math Kernel Library andldb is ignored.

offsetc

CHARACTER*1. Specifies the form of C_offset used in the matrix multiplication.

If offsetc='F' or 'f'  :oc has a single element and every element of C_offset is equal to this element.

If offsetc='C' or 'c' :oc has a size of m and every element of C_offset is equal to oc.

If offsetc='R' or 'r' :oc has a size of n and every element of C_offset is equal to oc.

m

INTEGER. Specifies the number of rows of the matrix op(A) and of the matrix C. The value of m must be at least zero.

n

INTEGER. Specifies the number of columns of the matrix op(B) and the number of columns of the matrix C. The value of n must be at least zero.

k

INTEGER. Specifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B). The value of k must be at least zero.

alpha

REAL. Specifies the scalar alpha.

a

INTEGER*1 for gemm_s8u8s32_compute

INTEGER*2 for gemm_s16s16s32_compute

transa = 'N' or 'n'

transa = 'T', 't'

transa = 'P' or 'p'

Array, size lda*k.

Before entry, the leading m-by-k part of the array a must contain the matrix A.

Array, size lda*m.

Before entry, the leading k-by-m part of the array a must contain the matrix A.

Array of size returned by gemm_*_pack_get_size and initialized using gemm_*_pack

lda

INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program.

If transa = 'N' or 'n', lda must be at least max (1, m).

If transa = 'T', 't', lda must be at least max (1, k).

oa

INTEGER*1 for gemm_s8u8s32_compute

INTEGER*2 for gemm_s16s16s32_compute

Specifies the scalar offset value for the matrix A.

b

INTEGER*1 for gemm_s8u8s32_compute

INTEGER*2 for gemm_s16s16s32_compute

transb = 'N' or 'n'

transb = 'T', 't'

transb = 'P' or 'p'

Array, size ldb*n.

Before entry, the leading k-by-n part of the array b must contain the matrix B.

Array, size ldb*k.

Before entry, the leading n-by-k part of the array b must contain the matrix B.

Array of size returned by gemm_*_pack_get_size and initialized using gemm_*_pack

ldb

INTEGER. Specifies the leading dimension of b as declared in the calling (sub)program.

If transb = 'N' or 'n', ldb must be at least max (1, k).

If transb = 'T', 't', 'C', or 'c', ldb must be at least max (1, n).

ob

INTEGER*1 for gemm_s8u8s32_compute

INTEGER*2 for gemm_s16s16s32_compute

Specifies the scalar offset value for the matrix B.

beta

REAL

Specifies the scalar beta.

c

INTEGER*4

Array, size ldc by n. Before entry, the leading m-by-n part of the array c must contain the matrix C, except when beta is equal to zero, in which case c need not be set on entry.

ldc

INTEGER. Specifies the leading dimension of c as declared in the calling (sub)program.

The value of ldc must be at least max (1, m).

oc

INTEGER*4

Array, size len. Specifies the scalar offset value for the matrix C.

If offsetc = 'F' or 'f', len must be at least 1.

If offsetc = 'C' or 'c', len must be at least max(1, m).

If offsetc = 'R' or 'r', len must be at least max(1, n).

Output Parameters

c

INTEGER*4

Overwritten by the matrix alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C + C_offset.

Example

See the following examples in the MKL installation directory to understand the use of these routines:

gemm_s8u8s32_compute: examples\blas\source\gemm_s8u8s32_computex.f

gemm_s16s16s32_compute: examples\blas\source\gemm_s16s16s32_computex.f

Application Notes

You can expand the matrix-matrix product in this manner:

(op(A) + A_offset)*(op(B) + B_offset) = op(A)*op(B) + op(A)*B_offset + A_offset*op(B) + A_offset*B_offset

After computing these four multiplication terms separately, they are summed from left to right. The results from the matrix-matrix product and the C matrix are scaled with alpha and beta floating-point values respectively using double-precision arithmetic. Before storing the results to the output c array, the floating-point values are rounded to the nearest integers.

In the event of overflow or underflow, the results depend on the architecture. The results are either unsaturated (wrapped) or saturated to maximum or minimum representable integer values for the data type of the output matrix.

See Also