Developer Reference for Intel® oneAPI Math Kernel Library for C

ID 766684
Date 3/31/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

cblas_?gemm_compute

Computes a matrix-matrix product with general matrices where one or both input matrices are stored in a packed data structure and adds the result to a scalar-matrix product.

Syntax

void cblas_hgemm_compute (const CBLAS_LAYOUT Layout, const MKL_INT transa, const MKL_INT transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const MKL_F16 *a, const MKL_INT lda, const MKL_F16 *b, const MKL_INT ldb, const MKL_F16 beta, MKL_F16 *c, const MKL_INT ldc);

void cblas_sgemm_compute (const CBLAS_LAYOUT Layout, const MKL_INT transa, const MKL_INT transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float *a, const MKL_INT lda, const float *b, const MKL_INT ldb, const float beta, float *c, const MKL_INT ldc);

void cblas_dgemm_compute (const CBLAS_LAYOUT Layout, const MKL_INT transa, const MKL_INT transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const double *a, const MKL_INT lda, const double *b, const MKL_INT ldb, const double beta, double *c, const MKL_INT ldc);

Include Files
  • mkl.h
Description

The cblas_?gemm_compute routine is one of a set of related routines that enable use of an internal packed storage. After calling cblas_?gemm_pack call cblas_?gemm_compute to compute

C := op(A)*op(B) + beta*C,

where:

  • op(X) is one of the operations op(X) = X, op(X) = XT, or op(X) = XH,
  • beta is a scalar,
  • A , B, and C are matrices:
  • op(A) is an m-by-k matrix,
  • op(B) is a k-by-n matrix,
  • C is an m-by-n matrix.

NOTE:

You must use the same value of the Layout parameter for the entire sequence of related cblas_?gemm_pack and cblas_?gemm_compute calls.

For best performance, use the same number of threads for packing and for computing.

If packing for both A and B matrices, you must use the same number of threads for packing A as for packing B.

Input Parameters
Layout

Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major (CblasColMajor).

transa

Specifies the form of op(A) used in the matrix multiplication, one of the CBLAS_TRANSPOSE or CBLAS_STORAGE enumerated types:

If transa = CblasNoTrans  op(A) = A.

If transa = CblasTrans  op(A) = AT.

If transa = CblasConjTrans  op(A) = AH.

If transa = CblasPacked the matrix in array a is packed and lda is ignored.

transb

Specifies the form of op(B) used in the matrix multiplication, one of the CBLAS_TRANSPOSE or CBLAS_STORAGE enumerated types:

If transb = CblasNoTrans  op(B) = B.

If transb = CblasTrans op(B) = BT.

If transb = CblasConjTrans op(B) = BH.

If transb = CblasPacked the matrix in array b is packed and ldb is ignored.

m

Specifies the number of rows of the matrix op(A) and of the matrix C. The value of m must be at least zero.

n

Specifies the number of columns of the matrix op(B) and the number of columns of the matrix C. The value of n must be at least zero.

k

Specifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B). The value of k must be at least zero.

a

Array:

 

transa = CblasNoTrans

transa = CblasTrans or transa = CblasConjTrans

transa = CblasPacked

Layout = CblasColMajor

Size lda*k.

Before entry, the leading m-by-k part of the array a must contain the matrix A.

Size lda*m.

Before entry, the leading k-by-m part of the array a must contain the matrix A.

Stored in internal packed format.

Layout = CblasRowMajor

Size lda*m.

Before entry, the leading k-by-m part of the array a must contain the matrix A.

Size lda*k.

Before entry, the leading m-by-k part of the array a must contain the matrix A.

Stored in internal packed format.

lda

Specifies the leading dimension of a as declared in the calling (sub)program.

 

transa = CblasNoTrans

transa = CblasTrans or transa = CblasConjTrans

transa = CblasPacked

Layout = CblasColMajor

lda must be at least max(1, m).

lda must be at least max(1, k).

lda is ignored.

Layout = CblasRowMajor

lda must be at least max(1, k).

lda must be at least max(1, m).

lda is ignored.

b

Array:

 

transb = CblasNoTrans

transb = CblasTrans or transb = CblasConjTrans

transb = CblasPacked

Layout = CblasColMajor

Size ldb*n.

Before entry, the leading k-by-n part of the array b must contain the matrix B.

Size ldb*k.

Before entry, the leading n-by-k part of the array b must contain the matrix B.

Stored in internal packed format.

Layout = CblasRowMajor

Size ldb*k.

Before entry, the leading n-by-k part of the array b must contain the matrix B.

Size ldb*n.

Before entry, the leading k-by-n part of the array b must contain the matrix B.

Stored in internal packed format.

ldb

Specifies the leading dimension of b as declared in the calling (sub)program.

 

transb = CblasNoTrans

transb = CblasTransor transb = CblasConjTrans

transb = CblasPacked

Layout = CblasColMajor

ldb must be at least max(1, k).

ldb must be at least max(1, n).

ldb is ignored.

Layout = CblasRowMajor

ldb must be at least max(1, n).

ldb must be at least max(1, k).

ldb is ignored.

beta

Specifies the scalar beta. When beta is equal to zero, then c need not be set on input.

c

Array:

Layout = CblasColMajor

Size ldc*n.

Before entry, the leading m-by-n part of the array c must contain the matrix C, except when beta is equal to zero, in which case c need not be set on entry.

Layout = CblasRowMajor

Size ldc*m.

Before entry, the leading n-by-m part of the array c must contain the matrix C, except when beta is equal to zero, in which case c need not be set on entry.

ldc

Specifies the leading dimension of c as declared in the calling (sub)program.

Layout = CblasColMajor

ldc must be at least max(1, m).

Layout = CblasRowMajor

ldc must be at least max(1, n).

Output Parameters

c

Overwritten by the m-by-n matrix op(A)*op(B) + beta*C.

See Also