Developer Reference for Intel® oneAPI Math Kernel Library for C
cblas_gemm_bf16bf16f32_compute
Computes a matrix-matrix product with general bfloat16 matrices (where one or both input matrices are stored in a packed data structure) and adds the result to a scalar-matrix product.
Syntax
C:
void cblas_gemm_bf16bf16f32_compute (const CBLAS_LAYOUT Layout, const MKL_INT transa, const MKL_INT transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float alpha, const MKL_BF16 *a, const MKL_INT lda, const MKL_BF16 *b, const MKL_INT ldb, const float beta, float *c, const MKL_INT ldc);
Include Files
- mkl.h
 
Description
The cblas_gemm_bf16bf16f32_compute routine is one of a set of related routines that enable use of an internal packed storage. After calling cblas_gemm_bf16bf16f32_pack call cblas_gemm_bf16bf16f32_compute to compute
C := alpha* op(A)*op(B) + beta*C,
where:
- op(X) is either op(X) = X or op(X) = XT,
 - alpha and beta are scalars,
 - A , B, and C are matrices:
 - op(A) is an m-by-k matrix,
 - op(B) is a k-by-n matrix,
 - C is an m-by-n matrix.
 
You must use the same value of the Layout parameter for the entire sequence of related cblas_gemm_bf16bf16f32_pack and cblas_gemm_bf16bf16f32_compute calls.
For best performance, use the same number of threads for packing and for computing.
If packing for both A and B matrices, you must use the same number of threads for packing A as for packing B.
Input Parameters
Layout  |  
      CBLAS_LAYOUT Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major (CblasColMajor).  |  
     ||||||||||||
transa  |  
      MKL_INT Specifies the form of op(A) used in the packing: If transa = CblasNoTrans op(A) = A. If transa = CblasTrans op(A) = AT. If transa = CblasPacked the matrix in array a is packed into a format internal to Intel® oneAPI Math Kernel Library (oneMKL) and lda is ignored.  |  
     ||||||||||||
transb  |  
      MKL_INT Specifies the form of op(B) used in the packing: If transb = CblasNoTrans op(B) = B. If transb = CblasTrans op(B) = BT. If transb = CblasPacked the matrix in array b is packed into a format internal to Intel® oneAPI Math Kernel Library (oneMKL) and ldb is ignored.  |  
     ||||||||||||
m  |  
      MKL_INT Specifies the number of rows of the matrix op(A) and of the matrix C. The value of m must be at least zero.  |  
     ||||||||||||
n  |  
      MKL_INT Specifies the number of columns of the matrix op(B) and the number of columns of the matrix C. The value of n must be at least zero.  |  
     ||||||||||||
k  |  
      MKL_INT Specifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B). The value of k must be at least zero.  |  
     ||||||||||||
alpha  |  
      float Specifies the scalar alpha.  |  
     ||||||||||||
a  |  
      MKL_BF16* 
  |  
     ||||||||||||
lda  |  
      MKL_INT Specifies the leading dimension of a as declared in the calling (sub)program. 
  |  
     ||||||||||||
b  |  
      MKL_BF16* 
  |  
     ||||||||||||
ldb  |  
      MKL_INT Specifies the leading dimension of b as declared in the calling (sub)program. 
  |  
     ||||||||||||
beta  |  
      float Specifies the scalar beta.  |  
     ||||||||||||
c  |  
      float* 
  |  
     ||||||||||||
ldc  |  
      MKL_INT Specifies the leading dimension of c as declared in the calling (sub)program. 
  |  
     
Output Parameters
c  |  
      float* Overwritten by the matrix alpha * op(A)*op(B) + beta*C.  |  
     
Example
See the following examples in the Intel® oneAPI Math Kernel Library (oneMKL) installation directory to understand the use of these routines:
cblas_gemm_bf16bf16f32_compute: examples\cblas\source\cblas_gemm_bf16bf16f32_computex.c
Application Notes
On architectures without native bfloat16 hardware instructions, matrix A and B are upconverted to single precision and SGEMM is called to compute matrix multiplication operation.