cblas_gemm_bf16bf16f32

Developer Reference for Intel® oneAPI Math Kernel Library for C

Download PDF

ID 766684

Date 6/30/2025

Version

Public

cblas_gemm_bf16bf16f32_compute

Computes a matrix-matrix product with general bfloat16 matrices (where one or both input matrices are stored in a packed data structure) and adds the result to a scalar-matrix product.

Syntax

void cblas_gemm_bf16bf16f32_compute (const CBLAS_LAYOUT Layout, const MKL_INT transa, const MKL_INT transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float alpha, const MKL_BF16 *a, const MKL_INT lda, const MKL_BF16 *b, const MKL_INT ldb, const float beta, float *c, const MKL_INT ldc);

Include Files

mkl.h

Description

The cblas_gemm_bf16bf16f32_compute routine is one of a set of related routines that enable use of an internal packed storage. After calling cblas_gemm_bf16bf16f32_pack call cblas_gemm_bf16bf16f32_compute to compute

C := alpha* op(A)*op(B) + beta*C,

where:

op(X) is either op(X) = X or op(X) = X^T,
alpha and beta are scalars,
A , B, and C are matrices:
op(A) is an m-by-k matrix,
op(B) is a k-by-n matrix,
C is an m-by-n matrix.

NOTE:

You must use the same value of the Layout parameter for the entire sequence of related cblas_gemm_bf16bf16f32_pack and cblas_gemm_bf16bf16f32_compute calls.

For best performance, use the same number of threads for packing and for computing.

If packing for both A and B matrices, you must use the same number of threads for packing A as for packing B.

Input Parameters

Layout

CBLAS_LAYOUT

Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major (CblasColMajor).

transa

MKL_INT

Specifies the form of op(A) used in the packing:

If transa = CblasNoTrans op(A) = A.

If transa = CblasTrans op(A) = A^T.

If transa = CblasPacked the matrix in array a is packed into a format internal to Intel® oneAPI Math Kernel Library (oneMKL) and lda is ignored.

transb

MKL_INT

Specifies the form of op(B) used in the packing:

If transb = CblasNoTrans op(B) = B.

If transb = CblasTrans op(B) = B^T.

If transb = CblasPacked the matrix in array b is packed into a format internal to Intel® oneAPI Math Kernel Library (oneMKL) and ldb is ignored.

MKL_INT

Specifies the number of rows of the matrix op(A) and of the matrix C. The value of m must be at least zero.

MKL_INT

Specifies the number of columns of the matrix op(B) and the number of columns of the matrix C. The value of n must be at least zero.

MKL_INT

Specifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B). The value of k must be at least zero.

alpha

float

Specifies the scalar alpha.

MKL_BF16*

transa = CblasNoTrans

transa = CblasTrans

transa = CblasPacked

Layout = CblasColMajor

Array, size lda*k.

Before entry, the leading m-by-k part of the array a must contain the matrix A.

Array, size lda*m.

Before entry, the leading k-by-m part of the array a must contain the matrix A.

Array of size returned by cblas_gemm_bf16bf16f32_pack_get_size and initialized using cblas_gemm_bf16bf16f32_pack.

Layout = CblasRowMajor

Array, size lda*m.

Before entry, the leading k-by-m part of the array a must contain the matrix A.

Array, size lda*k.

Before entry, the leading m-by-k part of the array a must contain the matrix A.

Array size returned by cblas_gemm_bf16bf16f32_pack_get_size and initialized using cblas_gemm_bf16bf16f32_pack.

lda

MKL_INT

Specifies the leading dimension of a as declared in the calling (sub)program.

	transa = CblasNoTrans	transa = CblasTrans
Layout = CblasColMajor	lda must be at least `max(1, m)`.	lda must be at least `max(1, k)`.
Layout = CblasRowMajor	lda must be at least `max(1, k)`.	lda must be at least `max(1, m)`.

MKL_BF16*

transa = CblasNoTrans

transa = CblasTrans

transa = CblasPacked

Layout = CblasColMajor

Array, size ldb*n.

Before entry, the leading k-by-n part of the array b must contain the matrix B.

Array, size ldb*k.

Before entry, the leading n-by-k part of the array b must contain the matrix B.

Array of size returned by cblas_gemm_bf16bf16f32_pack_get_size and initialized using cblas_gemm_bf16bf16f32_pack.

Layout = CblasRowMajor

Array, size ldb*k.

Before entry, the leading n-by-k part of the array b must contain the matrix B.

Array, size ldb*n.

Before entry, the leading k-by-n part of the array b must contain the matrix B.

Array size returned by cblas_gemm_bf16bf16f32_pack_get_size and initialized using cblas_gemm_bf16bf16f32_pack.

ldb

MKL_INT

Specifies the leading dimension of b as declared in the calling (sub)program.

	transb = CblasNoTrans	transb = CblasTrans
Layout = CblasColMajor	ldb must be at least `max(1, k)`.	ldb must be at least `max(1, n)`.
Layout = CblasRowMajor	ldb must be at least `max(1, n)`.	ldb must be at least `max(1, k)`.

beta

float

Specifies the scalar beta.

float*

Layout = CblasColMajor

Array, size ldc*n.

Before entry, the leading m-by-n part of the array c must contain the matrix C, except when beta is equal to zero, in which case c need not be set on entry.

Layout = CblasRowMajor

Array, size ldc*m.

Before entry, the leading n-by-m part of the array c must contain the matrix C, except when beta is equal to zero, in which case c need not be set on entry.

ldc

MKL_INT

Specifies the leading dimension of c as declared in the calling (sub)program.

Layout = CblasColMajor	ldc must be at least `max(1, m)`.
Layout = CblasRowMajor	ldc must be at least `max(1, n)`.

Output Parameters

c	float* Overwritten by the matrix `alpha * op(A)op(B) + betaC`.

Example

See the following examples in the Intel® oneAPI Math Kernel Library (oneMKL) installation directory to understand the use of these routines:

cblas_gemm_bf16bf16f32_compute:
 examples\cblas\source\cblas_gemm_bf16bf16f32_computex.c

Application Notes

On architectures without native bfloat16 hardware instructions, matrix A and B are upconverted to single precision and SGEMM is called to compute matrix multiplication operation.

Parent topic: BLAS-Like Extensions

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Developer Reference for Intel® oneAPI Math Kernel Library for C

cblas_gemm_bf16bf16f32_compute

Syntax

Include Files

Description

Input Parameters

Output Parameters

Example

Application Notes