Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 3/31/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Intrinsics for FP Pack and Unpack Operations

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>


Intrinsic Name

Operation

Corresponding
Intel® AVX-512 Instruction

_mm512_unpackhi_pd, _mm512_mask_unpackhi_pd, _mm512_maskz_unpackhi_pd

Unpacks and interleaves high packed float64 values.

VPUNPCKHPD

_mm512_unpackhi_ps, _mm512_mask_unpackhi_ps, _mm512_maskz_unpackhi_ps

Unpacks and interleaves high packed float32 values.

VPUNPCKHPS

_mm512_unpacklo_pd, _mm512_mask_unpacklo_pd, _mm512_maskz_unpacklo_pd

Unpacks and interleaves low packed float64 values.

VPUNPCKLPD

_mm512_unpacklo_ps, _mm512_mask_unpacklo_ps, _mm512_maskz_unpacklo_ps

Unpacks and interleaves low packed float32 values.

VPUNPCKLPS


variable definition
k

writemask used as a selector

a

first source vector element

b

second source vector element

src

source element to use based on writemask result


_mm512_unpackhi_pd

extern __m512d __cdecl _mm512_unpackhi_pd(__m512d a, __m512d b);

Unpacks and interleaves float64 elements from the high half of each 128-bit lane in a and b, and stores the result.



_mm512_mask_unpackhi_pd

extern __m512d __cdecl _mm512_mask_unpackhi_pd(__m512d src, __mmask8 k, __m512d a, __m512d b);

Unpacks and interleaves float64 elements from the high half of each 128-bit lane in a and b, and stores the result using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_unpackhi_pd

extern __m512d __cdecl _mm512_maskz_unpackhi_pd(__mmask8 k, __m512d a, __m512d b);

Unpacks and interleaves float64 elements from the high half of each 128-bit lane in a and b, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_unpackhi_ps

extern __m512 __cdecl _mm512_unpackhi_ps(__m512 a, __m512 b);

Unpacks and interleaves float32 elements from the high half of each 128-bit lane in a and b, and stores the result.



_mm512_mask_unpackhi_ps

extern __m512 __cdecl _mm512_mask_unpackhi_ps(__m512 src, __mmask16 k, __m512 a, __m512 b);

Unpacks and interleaves float32 elements from the high half of each 128-bit lane in a and b, and stores the result using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_unpackhi_ps

extern __m512 __cdecl _mm512_maskz_unpackhi_ps(__mmask16 k, __m512 a, __m512 b);

Unpacks and interleaves float32 elements from the high half of each 128-bit lane in a and b, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_unpacklo_pd

extern __m512d __cdecl _mm512_unpacklo_pd(__m512d a, __m512d b);

Unpacks and interleaves float64 elements from the low half of each 128-bit lane in a and b, and stores the result.



_mm512_mask_unpacklo_pd

extern __m512d __cdecl _mm512_mask_unpacklo_pd(__m512d src, __mmask8 k, __m512d a, __m512d b);

Unpacks and interleaves float64 elements from the low half of each 128-bit lane in a and b, and stores the result using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_unpacklo_pd

extern __m512d __cdecl _mm512_maskz_unpacklo_pd(__mmask8 k, __m512d a, __m512d b);

Unpacks and interleaves float64 elements from the low half of each 128-bit lane in a and b, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_unpacklo_ps

extern __m512 __cdecl _mm512_unpacklo_ps(__m512 a, __m512 b);

Unpacks and interleaves float32 elements from the low half of each 128-bit lane in a and b, and stores the result.



_mm512_mask_unpacklo_ps

extern __m512 __cdecl _mm512_mask_unpacklo_ps(__m512 src, __mmask16 k, __m512 a, __m512 b);

Unpacks and interleaves float32 elements from the low half of each 128-bit lane in a and b, and stores the result using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_unpacklo_ps

extern __m512 __cdecl _mm512_maskz_unpacklo_ps(__mmask16 k, __m512 a, __m512 b);

Unpacks and interleaves float32 elements from the low half of each 128-bit lane in a and b, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).