Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 12/16/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Intrinsics for Integer Gather and Scatter Operations

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>


Intrinsic Name

Operation

Corresponding
Intel® AVX-512 Instruction

_mm512_i32gather_epi32, _mm512_mask_i32gather_epi32

Gathers 32-bit integers from memory using 32-bit indices.

VPGATHERDD

_mm512_i32gather_epi64, _mm512_mask_i32gather_epi64

Gathers 64-bit integers from memory using 32-bit indices.

VPGATHERDQ

_mm512_i64gather_epi32, _mm512_mask_i64gather_epi32

Gathers 32-bit integers from memory using 64-bit indices.

VPGATHERQD

_mm512_i64gather_epi64, _mm512_mask_i64gather_epi64

Gathers 64-bit integers from memory using 64-bit indices.

VPGATHERQQ

_mm512_i32scatter_epi32, _mm512_mask_i32scatter_epi32

Scatters 32-bit integers into memory using 32-bit indices.

VPSCATTERDD

_mm512_i32scatter_epi64, _mm512_mask_i32scatter_epi64

Scatters 64-bit integers into memory using 32-bit indices.

VPSCATTERDQ

_mm512_i64scatter_epi32, _mm512_mask_i64scatter_epi32

Scatters 32-bit integers into memory using 64-bit indices.

VPSCATTERQD

_mm512_i64scatter_epi64, _mm512_mask_i64scatter_epi64

Scatters 64-bit integers into memory using 64-bit indices.

VPSCATTERQQ


variable definition
vindex

a vector of indices

base_addr

a pointer to the base address in memory

scale

a compilation-time literal constant that is used as the vector indices scale. Possible values are 1, 2, 4, or 8.

k

mask used as a selector

a

first source vector element

src

source element to use based on the mask result


_mm512_i32gather_epi32

__m512i _mm512_i32gather_epi32(__m512i vindex, void const* base_addr, int scale)

Gathers 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_i32gather_epi32

__m512i _mm512_mask_i32gather_epi32(__m512i src, __mmask16 k, __m512i vindex, void const* base_addr, int scale)

Gathers 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged with src using mask k. When the corresponding mask bit is not set, elements are copied from src.


_mm512_i32gather_epi64

__m512i _mm512_mask_i32gather_epi64 (__m512i src, __mmask8 k, __m256i vindex, void const* base_addr, int scale)

Gathers 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_i32gather_epi64

__m512i _mm512_mask_i32gather_epi64 (__m512i src, __mmask8 k, __m256i vindex, void const* base_addr, int scale)

Gathers 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged with src using mask k. When the corresponding mask bit is not set, elements are copied from src.


_mm512_i64gather_epi32

__m256i _mm512_i64gather_epi32 (__m512i vindex, void const* base_addr, int scale)

Gathers 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_i64gather_epi32

__m256i _mm512_mask_i64gather_epi32 (__m256i src, __mmask8 k, __m512i vindex, void const* base_addr, int scale)

Gathers 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged with src using mask k. When the corresponding mask bit is not set, elements are copied from src.


_mm512_i64gather_epi64

__m512i _mm512_i64gather_epi64 (__m512i vindex, void const* base_addr, int scale)

Gathers 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_i64gather_epi64

__m512i _mm512_mask_i64gather_epi64 (__m512i src, __mmask8 k, __m512i vindex, void const* base_addr, int scale)

Gathers 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged with src using mask k. When the corresponding mask bit is not set, elements are copied from src.


_mm512_i32scatter_epi32

void mm512_i32scatter_epi32(void* base_addr, __m512i vindex, __m512i a, int scale)

Scatters 32-bit integers from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_i32scatter_epi32

void _mm512_mask_i32scatter_epi32(void* base_addr, __mmask16 k, __m512i vindex, __m512i a, int scale

Scatters 32-bit integers from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k. When the corresponding mask bit is not set, elements are not stored.


_mm512_i32scatter_epi64

void _mm512_i32scatter_epi64 (void* base_addr, __m256i vindex, __m512i a, int scale)

Scatters 64-bit integers from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_i32scatter_epi64

void _mm512_mask_i32scatter_epi64 (void* base_addr, __mmask8 k, __m256i vindex, __m512i a, int scale)

Scatters 64-bit integers from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k. When the corresponding mask bit is not set, elements are not stored.


_mm512_i64scatter_epi32

void _mm512_i64scatter_epi32 (void* base_addr, __m512i vindex, __m256i a, int scale)

Scatters 32-bit integers from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_i64scatter_epi32

void _mm512_mask_i64scatter_epi32 (void* base_addr, __mmask8 k, __m512i vindex, __m256i a, int scale)

Scatters 32-bit integers from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k. When the corresponding mask bit is not set, elements are not stored.


_mm512_i64scatter_epi64

void _mm512_i64scatter_epi64 (void* base_addr, __m512i vindex, __m512i a, int scale)

Scatters 64-bit integers from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_i64scatter_epi64

void _mm512_mask_i64scatter_epi64 (void* base_addr, __mmask8 k, __m512i vindex, __m512i a, int scale)

Scatters 64-bit integers from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k. When the corresponding mask bit is not set, elements are not stored.