Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 3/31/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Intrinsics for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Additional Instructions

Additional Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Instructions

The additional instructions documented in this section enrich the operations available as part of Intel® AVX-512 Foundation instructions. A large portion of these instructions can be divided into two groups: Byte and Word Instructions, and Doubleword and Quadword Instructions. The group of byte and word (8 and 16-bit) operations, indicated by the AVX512BW and AVX512VBMI CPUID flags, enhance small integer operations. The group of doubleword and quadword (32 and 64-bit) operations indicated by the AVX512DQ and AVX512IFMA52 CPUID flags, enhance integer and floating-point operations.

An additional orthogonal capability known as Vector Length Extensions provide for most AVX-512 instructions to operate on 128 or 256 bits, instead of only 512. Vector Length Extensions can currently be applied to most Foundation Instructions, the Conflict Detection Instructions as well as the new byte, word, doubleword and quadword instructions. These AVX-512 Vector Length Extensions are indicated by the AVX512VL CPUID flag. The use of Vector Length Extensions extends most AVX-512 operations to also operate on XMM (128-bit, SSE) registers and YMM (256-bit, AVX) registers. The use of Vector Length Extensions allows the capabilities of EVEX encodings, including the use of mask registers and access to registers 16..31, to be applied to XMM and YMM registers instead of only to ZMM registers.

Byte and Word Instructions

The byte and word instructions, indicated by the AVX512BW CPUID flag, extend write-masking and zero-masking to support smaller element sizes. The original AVX-512 Foundation instructions supported such masking with vector element sizes of 32 or 64 bits. As a 512-bit vector register could hold at most 16 32-bit elements, a write-mask size of 16 bits was sufficient.

With an instruction indicated by an AVX512BW CPUID flag, a 512-bit vector can hold 64 8-bit elements or 32 16-bit elements, so write masks must be able to hold 64 bits. To support this, two new mask types, __mmask32 and __mmask64 have been introduced, along with additional maskable intrinsics that operate on vectors of 8 and 16-bit elements. For example,

__m512i _mm512_mask_abs_epi8(__m512i src, __mmask64 k, __m512i a);

will compute the absolute value of 8-bit elements in a corresponding to the set bits of write mask k. Elements corresponding to a zero bit in k are blended in from src.

Doubleword and Quadword Instructions

The doubleword and quadword instructions, indicated by the AVX512DQ CPUID flag, consist of additional instructions along the lines of the Foundation instructions indicated by the AVX512F CPUID flag in that they operate on 512-bit vectors whose elements are 16 32-bit elements or 8 64-bit elements. Some of these instructions provide new functionality such as the conversion of floating point numbers to 64-bit integers. Other instructions promote existing instructions (e.g., vxorps) to use 512-bit registers.

Vector Length Extensions

The vector length extensions indicated by CPUID flag AVX512VL add write-masking, zero-masking, and embedded broadcast features to 128- and 256-bit vector lengths. So for example,

__m256 _mm256_maskz_add_ps(__mmask8 k, __m256 a, __m256 b);

will add corresponding float32 elements of a and b where the mask bit from k is set, and will produce zero in the elements where the bit from k is clear.