Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 3/31/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

_tile_dpbf16ps

Synopsis

void _tile_dpbf16ps (__tile dst, __tile a, __tile b)
Type Value
Type TileFloating Point
Header file #include <immintrin.h>
Instruction TDPBF16PS tmm, tmm, tmm
CPUID flags AMXBF16

Description

Compute dot-product of BF16 (16-bit) floating-point pairs in tiles "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "dst", and store the 32-bit result back to tile "dst".

Technology

AMX

Category

Application-Targeted

Operation

FOR m := 0 TO dst.rows - 1
	tmp := dst.row[m]
	FOR k := 0 TO (a.colsb / 4) - 1
		FOR n := 0 TO (dst.colsb / 4) - 1
			tmp.fp32[n] += FP32(a.row[m].bf16[2*k+0]) * FP32(b.row[k].bf16[2*n+0])
			tmp.fp32[n] += FP32(a.row[m].bf16[2*k+1]) * FP32(b.row[k].bf16[2*n+1])
		ENDFOR
	ENDFOR
	write_row_and_zero(dst, m, tmp, dst.colsb)
ENDFOR

zero_upper_rows(dst, dst.rows)
zero_tileconfig_start()