Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 7/13/2023
Public
Document Table of Contents

Intrinsics for Integer Load and Store Operations

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>


Intrinsic Name

Operation

Corresponding
Intel® AVX-512 Instruction

_mm512_load_epi32, _mm512_mask_load_epi32, _mm512_maskz_load_epi32

Load packed int32 elements from memory

VMOVDQA32

_mm512_load_epi64, _mm512_mask_load_epi64, _mm512_maskz_load_epi64

Load packed int64 elements from memory

VMOVDQA64

_mm512_loadu_si512

Unaligned load of 512-bit scalar integer

VMOVDQU32

_mm512_mask_loadu_epi32, _mm512_maskz_loadu_epi32

Unaligned load of packed int32 elements

VMOVDQU32

_mm512_mask_loadu_epi64, _mm512_maskz_loadu_epi64

Unaligned load of packed int64 elements

VMOVDQU64

_mm512_stream_load_si512

Load double quadword using non-temporal aligned hint.

MOVNTDQA

_mm512_mask_storeu_epi64

Store unaligned packed int64 elements

VMOVDQU64

_mm512_stream_si512

Store packed integer values using non-temporal hint.

VMOVNTDQA


variable definition
k

writemask used as a selector

a

first source vector element

mem_addr

pointer to base address in memory

src

source element to use based on writemask result


_mm512_load_si512

extern __m512i __cdecl _mm512_load_si512(void const* mem_addr);

Load 512-bits of integer data from memory into destination.

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_loadu_si512

extern __m512i __cdecl _mm512_loadu_si512(void const* mem_addr);

Load 512-bits of integer data from memory into destination.

mem_addr does not need to be aligned on any particular boundary.



_mm512_load_epi32

extern __m512i __cdecl _mm512_load_epi32(void const* mem_addr);

Load 512-bits (composed of sixteen packed 32-bit integers) from memory into destination.

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_mask_load_epi32

extern __m512i __cdecl _mm512_mask_load_epi32(__m512i src, __mmask16 k, void const* mem_addr);

Load packed int32 elements from memory into destination using writemask k (elements are copied from src when the corresponding mask bit is not set).

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_maskz_load_epi32

extern __m512i __cdecl _mm512_maskz_load_epi32(__mmask16 k, void const* mem_addr);

Load packed int32 elements from memory into destination using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_load_epi64

extern __m512i __cdecl _mm512_load_epi64(void const* mem_addr);

Load 512-bits (composed of eight packed int64 elements ) from memory into destination.

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_mask_load_epi64

extern __m512i __cdecl _mm512_mask_load_epi64(__m512i src, __mmask8 k, void const* mem_addr);

Load packed int64 elements from memory into destination using writemask k (elements are copied from src when the corresponding mask bit is not set).

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_maskz_load_epi64

extern __m512i __cdecl _mm512_maskz_load_epi64(__mmask8 k, void const* mem_addr);

Load packed int64 elements from memory into destination using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_mask_loadu_epi32

extern __m512i __cdecl _mm512_mask_loadu_epi32(__m512i src, __mmask16 k, void const* mem_addr);

Load packed int32 elements from memory into destination using writemask k (elements are copied from src when the corresponding mask bit is not set).

mem_addr does not need to be aligned on any particular boundary.



_mm512_maskz_loadu_epi32

extern __m512i __cdecl _mm512_maskz_loadu_epi32(__mmask16 k, void const* mem_addr);

Load packed int32 elements from memory into destination using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

mem_addr does not need to be aligned on any particular boundary.



_mm512_mask_loadu_epi64

extern __m512i __cdecl _mm512_mask_loadu_epi64(__m512i src, __mmask8 k, void const* mem_addr);

Load packed int64 elements from memory into destination using writemask k (elements are copied from src when the corresponding mask bit is not set).

mem_addr does not need to be aligned on any particular boundary.



_mm512_stream_load_si512

extern __m512i __cdecl _mm512_stream_load_si512(void * mem_addr);

Load 512-bits of integer data from memory into destination using a non-temporal memory hint.

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_store_epi32

extern void __cdecl _mm512_store_epi32(void* mem_addr, __m512i a);

Store 512-bits (composed of sixteen packed 32-bit integers) from a into memory.

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_mask_store_epi32

extern void __cdecl _mm512_mask_store_epi32(void* mem_addr, __mmask16 k, __m512i a);

Store packed int32 elements from a into memory using writemask k.

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_store_si512

extern void __cdecl _mm512_store_si512(void* mem_addr, __m512i a);

Store 512-bits of integer data from a into memory.

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_store_epi64

extern void __cdecl _mm512_store_epi64(void* mem_addr, __m512i a);

Store 512-bits (composed of eight packed int64 elements ) from a into memory.

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_mask_store_epi64

extern void __cdecl _mm512_mask_store_epi64(void* mem_addr, __mmask8 k, __m512i a);

Store packed int64 elements from a into memory using writemask k.

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_mask_storeu_epi32

extern void __cdecl _mm512_mask_storeu_epi32(void* mem_addr, __mmask16 k, __m512i a);

Store packed int32 elements from a into memory using writemask k.

mem_addr does not need to be aligned on any particular boundary.



_mm512_mask_storeu_epi64

extern void __cdecl _mm512_mask_storeu_epi64(void* mem_addr, __mmask8 k, __m512i a);

Store packed int64 elements from a into memory using writemask k.

mem_addr does not need to be aligned on any particular boundary.



_mm512_storeu_si512

extern void __cdecl _mm512_storeu_si512(void* mem_addr, __m512i a);

Store 512-bits of integer data from a into memory.

mem_addr does not need to be aligned on any particular boundary.



_mm512_stream_si512

extern void __cdecl _mm512_stream_si512(void* mem_addr, __m512i a);

Store 512-bits of integer data from a into memory using a non-temporal memory hint.