Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 7/13/2023
Public
Document Table of Contents

_tile_dpbuud

Synopsis

void _tile_dpbuud (__tile dst, __tile a, __tile b)
Type Value
Type Tile
Header file #include <immintrin.h>
Instruction TDPBUUD tmm, tmm, tmm
CPUID flags AMXINT8

Description

Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding unsigned 8-bit integers in "b", producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in "dst", and store the 32-bit result back to tile "dst".

Technology

AMX

Category

Application-Targeted

Operation

DEFINE DPBD(c, x, y) {
	tmp1 := ZeroExtend32(x.byte[0]) * ZeroExtend32(y.byte[0])
	tmp2 := ZeroExtend32(x.byte[1]) * ZeroExtend32(y.byte[1])
	tmp3 := ZeroExtend32(x.byte[2]) * ZeroExtend32(y.byte[2])
	tmp4 := ZeroExtend32(x.byte[3]) * ZeroExtend32(y.byte[3])
	
	RETURN c + tmp1 + tmp2 + tmp3 + tmp4
}

FOR m := 0 TO dst.rows - 1
	tmp := dst.row[m]
	FOR k := 0 TO (a.colsb / 4) - 1
		FOR n := 0 TO (dst.colsb / 4) - 1
			tmp.dword[n] := DPBD(tmp.dword[n], a.row[m].dword[k], b.row[k].dword[n])
		ENDFOR
	ENDFOR
	write_row_and_zero(dst, m, tmp, dst.colsb)
ENDFOR

zero_upper_rows(dst, dst.rows)
zero_tileconfig_start()