Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 12/16/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Intrinsics for FP Expand and Load Operations

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>


Intrinsic Name

Operation

Corresponding
Intel® AVX-512 Instruction

_mm512_expand_pd, _mm512_mask_expand_pd, _mm512_maskz_expand_pd

Load packed float64 values from dense memory.

VEXPANDPD

_mm512_mask_expandloadu_pd, _mm512_maskz_expandloadu_pd

Load packed float64 values from dense memory.

VEXPANDPD

_mm512_expand_ps, _mm512_mask_expand_ps, _mm512_maskz_expand_ps

Load packed float32 values from dense memory.

VEXPANDPS

_mm512_mask_expandloadu_ps, _mm512_maskz_expandloadu_ps

Load packed float32 values from dense memory.

VEXPANDPS


variable definition
k

writemask used as a selector

a

first source vector element

src

source element to use based on writemask result

mem_addr

pointer to memory address


_mm512_expand_pd

extern __m512d __cdecl _mm512_expand_pd(__m512d a);

Loads contiguous active float64 elements from a (those with their respective bit set in mask k), and stores the result.



_mm512_mask_expand_pd

extern __m512d __cdecl _mm512_mask_expand_pd(__m512d src, __mmask8 k, __m512d a);

Loads contiguous active float64 elements from a (those with their respective bit set in mask k), and stores the result using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_expand_pd

extern __m512d __cdecl _mm512_maskz_expand_pd(__mmask8 k, __m512d a);

Loads contiguous active float64 elements from a (those with their respective bit set in mask k), and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_expand_ps

extern __m512 __cdecl _mm512_expand_ps(__m512 a);

Loads contiguous active float32 elements from a (those with their respective bit set in mask k), and stores the result.



_mm512_mask_expand_ps

extern __m512 __cdecl _mm512_mask_expand_ps(__m512 src, __mmask16 k, __m512 a);

Loads contiguous active float32 elements from a (those with their respective bit set in mask k), and stores the result using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_expand_ps

extern __m512 __cdecl _mm512_maskz_expand_ps(__mmask16 k, __m512 a);

Loads contiguous active float32 elements from a (those with their respective bit set in mask k), and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_expandloadu_pd

extern __m512d __cdecl _mm512_mask_expandloadu_pd(__m512d src, __mmask8 k, void * mem_addr);

Loads contiguous active float64 elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and stores the result using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_expandloadu_pd

extern __m512d __cdecl _mm512_maskz_expandloadu_pd(__mmask8 k, void * mem_addr);

Loads contiguous active float64 elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_expandloadu_ps

extern __m512 __cdecl _mm512_mask_expandloadu_ps(__m512 src, __mmask16 k, void * mem_addr);

Loads contiguous active float32 elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and stores the result using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_expandloadu_ps

extern __m512 __cdecl _mm512_maskz_expandloadu_ps(__mmask16 k, void * mem_addr);

Loads contiguous active float32 elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).