Visible to Intel only — GUID: GUID-5922A85B-1237-423F-866C-82560B230BBB
Visible to Intel only — GUID: GUID-5922A85B-1237-423F-866C-82560B230BBB
Intrinsics for Intel® Advanced Vector Extensions 2 (Intel® AVX2)
Intel® Advanced Vector Extensions 2 (Intel® AVX2) extends Intel® Advanced Vector Extensions (Intel® AVX) by promoting most of the 128-bit SIMD integer instructions with 256-bit numeric processing capabilities. The Intel® AVX2 instructions follow the same programming model as the Intel® AVX instructions.
Intel® AVX2 also provides enhanced functionality for broadcast/permute operations on data elements, vector shift instructions with variable-shift count per data element, and instructions to fetch non-contiguous data elements from memory.
Intel® AVX2 intrinsics have vector variants that use __m128, __m128i, __m256, and __m256i data types.
To use these intrinsics, include the immintrin.h file as follows:
#include <immintrin.h>
The Intel® AVX2 intrinsics are supported on IA-32 and Intel® 64 architectures built from 32nm process technology. They map directly to the Intel® AVX2 new instructions and other enhanced 128-bit SIMD instructions.
Functional Overview
Intel® AVX2 instructions promote the vast majority of 128-bit integer SIMD instruction sets to operate with 256-bit wide YMM registers. Intel® AVX2 instructions are encoded using the VEX prefix and require the same operating system support as Intel® AVX. Generally, most of the promoted 256-bit vector integer instructions follow the 128-bit lane operation, similar to the promoted 256-bit floating-point SIMD instructions in Intel® AVX.
The Intel® AVX2 instructions may be broadly categorized as follows:
- Intel® AVX complementary integer instructions: Intel® AVX2 instructions complement the Intel® AVX instructions that are typed for integer operations with a full complement of equivalent instructions for operating with integer data elements.
- BROADCAST and PERMUTE instructions: These instructions provide cross-lane functionality for floating-point and integer operations. In addition, some of the Intel® AVX2 256-bit vector integer instructions promoted from legacy SSE instruction sets also exhibiting cross-lane behavior fall into this category; for example, instructions of the VPMOVZ/VPMOVS family.
- SHIFT instructions: Intel® AVX2 vector SHIFT instructions operate with per-element shift count and support data element sizes of 32- and 64-bits.
- GATHER instructions: The Intel® AVX2 vector GATHER instructions are used for fetching non-contiguous data elements from memory using vector-index memory addressing. They introduce a new memory addressing form consisting of a base register and multiple indices specified by a vector register (XMM or YMM). Data element sizes of 32- and 64-bits are supported as well as data types for floating-point and integer elements.
- Intrinsics for Arithmetic Operations
- Intrinsics for Arithmetic Shift Operations
- Intrinsics for Blend Operations
- Intrinsics for Bitwise Operations
- Intrinsics for Broadcast Operations
- Intrinsics for Compare Operations
- Intrinsics for Fused Multiply Add Operations
- Intrinsics for GATHER Operations
- Intrinsics for Logical Shift Operations
- Intrinsics for Insert/Extract Operations
- Intrinsics for Masked Load/Store Operations
- Intrinsics for Miscellaneous Operations
- Intrinsics for Operations to Manipulate Integer Data at Bit-Granularity
- Intrinsics for Pack/Unpack Operations
- Intrinsics for Packed Move with Extend Operations
- Intrinsics for Permute Operations
- Intrinsics for Shuffle Operations
- Intrinsics for Intel® Transactional Synchronization Extensions (Intel® TSX)