Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 7/13/2023
Public
Document Table of Contents

Intrinsics for Miscellaneous Operations

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>


variable definition
src

source element to use based on writemask result

k

writemask used as a selector

a

first source vector element

b

second source vector element

c

third source vector element

rounding

Rounding control values; these can be one of the following (along with the sae suppress all exceptions flag):

  • _MM_FROUND_TO_NEAREST_INT - rounds to nearest even
  • _MM_FROUND_TO_NEG_INF - rounds to negative infinity
  • _MM_FROUND_TO_POS_INF - rounds to positive infinity
  • _MM_FROUND_TO_ZERO - rounds to zero
  • _MM_FROUND_CUR_DIRECTION - rounds using default from MXCSR register

interv

Where _MM_MANTISSA_NORM_ENUM can be one of the following:

  • _MM_MANT_NORM_1_2 - interval [1, 2)
  • _MM_MANT_NORM_p5_2 - interval [1.5, 2)
  • _MM_MANT_NORM_p5_1 - interval [1.5, 1)
  • _MM_MANT_NORM_p75_1p5 - interval [0.75, 1.5)

sc

Where _MM_MANTISSA_SIGN_ENUM can be one of the following:

  • _MM_MANT_SIGN_src - sign = sign(SRC)
  • _MM_MANT_SIGN_zero - sign = 0
  • _MM_MANT_SIGN_nan - DEST = NaN if sign(SRC) = 1


_mm_broadcast_i32x2

__m128i _mm_broadcast_i32x2(__m128i a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcasti32x2

Broadcast the lower 2 packed 32-bit integers from a to all elements of "dst.



_mm_mask_broadcast_i32x2

__m128i _mm_mask_broadcast_i32x2(__m128i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcasti32x2

Broadcast the lower 2 packed 32-bit integers from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_broadcast_i32x2

__m128i _mm_maskz_broadcast_i32x2(__mmask8 k, __m128i a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcasti32x2

Broadcast the lower 2 packed 32-bit integers from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_broadcast_i32x2

__m256i _mm256_broadcast_i32x2(__m128i a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcasti32x2

Broadcast the lower 2 packed 32-bit integers from a to all elements of "dst.



_mm256_mask_broadcast_i32x2

__m256i _mm256_mask_broadcast_i32x2(__m256i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcasti32x2

Broadcast the lower 2 packed 32-bit integers from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_broadcast_i32x2

__m256i _mm256_maskz_broadcast_i32x2(__mmask8 k, __m128i a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcasti32x2

Broadcast the lower 2 packed 32-bit integers from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_broadcast_i32x2

__m512i _mm512_broadcast_i32x2(__m128i a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcasti32x2

Broadcast the lower 2 packed 32-bit integers from a to all elements of "dst.



_mm512_mask_broadcast_i32x2

__m512i _mm512_mask_broadcast_i32x2(__m512i src, __mmask16 k, __m128i a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcasti32x2

Broadcast the lower 2 packed 32-bit integers from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_broadcast_i32x2

__m512i _mm512_maskz_broadcast_i32x2(__mmask16 k, __m128i a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcasti32x2

Broadcast the lower 2 packed 32-bit integers from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_broadcast_i32x4

__m256i _mm256_broadcast_i32x4(__m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vbroadcasti32x4

Broadcast the 4 packed 32-bit integers from a to all elements of the return value.



_mm256_mask_broadcast_i32x4

__m256i _mm256_mask_broadcast_i32x4(__m256i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vbroadcasti32x4

Broadcast the 4 packed 32-bit integers from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_broadcast_i32x4

__m256i _mm256_maskz_broadcast_i32x4(__mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vbroadcasti32x4

Broadcast the 4 packed 32-bit integers from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_broadcast_i32x8

__m512i _mm512_broadcast_i32x8(__m256i a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcasti32x8

Broadcast the 8 packed 32-bit integers from a to all elements of the return value.



_mm512_mask_broadcast_i32x8

__m512i _mm512_mask_broadcast_i32x8(__m512i src, __mmask16 k, __m256i a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcasti32x8

Broadcast the 8 packed 32-bit integers from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_broadcast_i32x8

__m512i _mm512_maskz_broadcast_i32x8(__mmask16 k, __m256i a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcasti32x8

Broadcast the 8 packed 32-bit integers from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_broadcast_i64x2

__m256i _mm256_broadcast_i64x2(__m128i a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcasti64x2

Broadcast the 2 packed 64-bit integers from a to all elements of the return value.



_mm256_mask_broadcast_i64x2

__m256i _mm256_mask_broadcast_i64x2(__m256i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcasti64x2

Broadcast the 2 packed 64-bit integers from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_broadcast_i64x2

__m256i _mm256_maskz_broadcast_i64x2(__mmask8 k, __m128i a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcasti64x2

Broadcast the 2 packed 64-bit integers from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_broadcast_i64x2

__m512i _mm512_broadcast_i64x2(__m128i a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcasti64x2

Broadcast the 2 packed 64-bit integers from a to all elements of the return value.



_mm512_mask_broadcast_i64x2

__m512i _mm512_mask_broadcast_i64x2(__m512i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcasti64x2

Broadcast the 2 packed 64-bit integers from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_broadcast_i64x2

__m512i _mm512_maskz_broadcast_i64x2(__mmask8 k, __m128i a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcasti64x2

Broadcast the 2 packed 64-bit integers from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_inserti32x4

__m256i _mm256_inserti32x4(__m256i a, __m128i b, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vinserti32x4

Copy a to the return value, then insert 128 bits (composed of 4 packed 32-bit integers) from b into dst at the location specified by imm.



_mm256_mask_inserti32x4

__m256i _mm256_mask_inserti32x4(__m256i src, __mmask8 k, __m256i a, __m128i b, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vinserti32x4

Copy a to tmp, then insert 128 bits (composed of 4 packed 32-bit integers) from b into tmp at the location specified by imm. Store tmp to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_inserti32x4

__m256i _mm256_maskz_inserti32x4(__mmask8 k, __m256i a, __m128i b, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vinserti32x4

Copy a to tmp, then insert 128 bits (composed of 4 packed 32-bit integers) from b into tmp at the location specified by imm. Store tmp to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_inserti32x8

__m512i _mm512_inserti32x8(__m512i a, __m256i b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vinserti32x8

Copy a to the return value, then insert 256 bits (composed of 8 packed 32-bit integers) from b into dst at the location specified by imm.



_mm512_mask_inserti32x8

__m512i _mm512_mask_inserti32x8(__m512i src, __mmask16 k, __m512i a, __m256i b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vinserti32x8

Copy a to tmp, then insert 256 bits (composed of 8 packed 32-bit integers) from b into tmp at the location specified by imm. Store tmp to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_inserti32x8

__m512i _mm512_maskz_inserti32x8(__mmask16 k, __m512i a, __m256i b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vinserti32x8

Copy a to tmp, then insert 256 bits (composed of 8 packed 32-bit integers) from b into tmp at the location specified by imm. Store tmp to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_inserti64x2

__m256i _mm256_inserti64x2(__m256i a, __m128i b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vinserti64x2

Copy a to the return value, then insert 128 bits (composed of 2 packed 64-bit integers) from b into dst at the location specified by imm.



_mm256_mask_inserti64x2

__m256i _mm256_mask_inserti64x2(__m256i src, __mmask8 k, __m256i a, __m128i b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vinserti64x2

Copy a to tmp, then insert 128 bits (composed of 2 packed 64-bit integers) from b into tmp at the location specified by imm. Store tmp to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_inserti64x2

__m256i _mm256_maskz_inserti64x2(__mmask8 k, __m256i a, __m128i b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vinserti64x2

Copy a to tmp, then insert 128 bits (composed of 2 packed 64-bit integers) from b into tmp at the location specified by imm. Store tmp to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_inserti64x2

__m512i _mm512_inserti64x2(__m512i a, __m128i b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vinserti64x2

Copy a to the return value, then insert 128 bits (composed of 2 packed 64-bit integers) from b into dst at the location specified by imm.



_mm512_mask_inserti64x2

__m512i _mm512_mask_inserti64x2(__m512i src, __mmask8 k, __m512i a, __m128i b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vinserti64x2

Copy a to tmp, then insert 128 bits (composed of 2 packed 64-bit integers) from b into tmp at the location specified by imm. Store tmp to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_inserti64x2

__m512i _mm512_maskz_inserti64x2(__mmask8 k, __m512i a, __m128i b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vinserti64x2

Copy a to tmp, then insert 128 bits (composed of 2 packed 64-bit integers) from b into tmp at the location specified by imm. Store tmp to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_shuffle_i32x4

__m256i _mm256_mask_shuffle_i32x4(__m256i src, __mmask8 k, __m256i a, __m256i b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufi32x4

Shuffle 128-bits (composed of 4 32-bit integers) selected by imm from a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_shuffle_i32x4

__m256i _mm256_maskz_shuffle_i32x4(__mmask8 k, __m256i a, __m256i b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufi32x4

Shuffle 128-bits (composed of 4 32-bit integers) selected by imm from a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_shuffle_i32x4

__m256i _mm256_shuffle_i32x4(__m256i a, __m256i b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufi32x4

Shuffle 128-bits (composed of 4 32-bit integers) selected by imm from a and b, and return the results.



_mm256_mask_shuffle_i64x2

__m256i _mm256_mask_shuffle_i64x2(__m256i src, __mmask8 k, __m256i a, __m256i b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufi64x2

Shuffle 128-bits (composed of 2 64-bit integers) selected by imm from a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_shuffle_i64x2

__m256i _mm256_maskz_shuffle_i64x2(__mmask8 k, __m256i a, __m256i b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufi64x2

Shuffle 128-bits (composed of 2 64-bit integers) selected by imm from a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_shuffle_i64x2

__m256i _mm256_shuffle_i64x2(__m256i a, __m256i b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufi64x2

Shuffle 128-bits (composed of 2 64-bit integers) selected by imm from a and b, and return the results.



_mm_mask_blend_pd

__m128d _mm_mask_blend_pd(__mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vblendmpd

Blend packed double-precision (64-bit) floating-point elements from a and b using control mask k, and return the results.



_mm256_mask_blend_pd

__m256d _mm256_mask_blend_pd(__mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vblendmpd

Blend packed double-precision (64-bit) floating-point elements from a and b using control mask k, and return the results.



_mm_mask_blend_ps

__m128 _mm_mask_blend_ps(__mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vblendmps

Blend packed single-precision (32-bit) floating-point elements from a and b using control mask k, and return the results.



_mm256_mask_blend_ps

__m256 _mm256_mask_blend_ps(__mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vblendmps

Blend packed single-precision (32-bit) floating-point elements from a and b using control mask k, and return the results.



_mm256_broadcast_f32x2

__m256 _mm256_broadcast_f32x2(__m128 a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcastf32x2

Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of the return value.



_mm256_mask_broadcast_f32x2

__m256 _mm256_mask_broadcast_f32x2(__m256 src, __mmask8 k, __m128 a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcastf32x2

Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_broadcast_f32x2

__m256 _mm256_maskz_broadcast_f32x2(__mmask8 k, __m128 a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcastf32x2

Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_broadcast_f32x2

__m512 _mm512_broadcast_f32x2(__m128 a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcastf32x2

Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of the return value.



_mm512_mask_broadcast_f32x2

__m512 _mm512_mask_broadcast_f32x2(__m512 src, __mmask16 k, __m128 a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcastf32x2

Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_broadcast_f32x2

__m512 _mm512_maskz_broadcast_f32x2(__mmask16 k, __m128 a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcastf32x2

Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_broadcast_f32x4

__m256 _mm256_broadcast_f32x4(__m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vbroadcastf32x4

Broadcast the 4 packed single-precision (32-bit) floating-point elements from a to all elements of the return value.



_mm256_mask_broadcast_f32x4

__m256 _mm256_mask_broadcast_f32x4(__m256 src, __mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vbroadcastf32x4

Broadcast the 4 packed single-precision (32-bit) floating-point elements from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_broadcast_f32x4

__m256 _mm256_maskz_broadcast_f32x4(__mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vbroadcastf32x4

Broadcast the 4 packed single-precision (32-bit) floating-point elements from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_broadcast_f32x8

__m512 _mm512_broadcast_f32x8(__m256 a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcastf32x8

Broadcast the 8 packed single-precision (32-bit) floating-point elements from a to all elements of the return value.



_mm512_mask_broadcast_f32x8

__m512 _mm512_mask_broadcast_f32x8(__m512 src, __mmask16 k, __m256 a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcastf32x8

Broadcast the 8 packed single-precision (32-bit) floating-point elements from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_broadcast_f32x8

__m512 _mm512_maskz_broadcast_f32x8(__mmask16 k, __m256 a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcastf32x8

Broadcast the 8 packed single-precision (32-bit) floating-point elements from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_broadcast_f64x2

__m256d _mm256_broadcast_f64x2(__m128d a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcastf64x2

Broadcast the 2 packed double-precision (64-bit) floating-point elements from a to all elements of the return value.



_mm256_mask_broadcast_f64x2

__m256d _mm256_mask_broadcast_f64x2(__m256d src, __mmask8 k, __m128d a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcastf64x2

Broadcast the 2 packed double-precision (64-bit) floating-point elements from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_broadcast_f64x2

__m256d _mm256_maskz_broadcast_f64x2(__mmask8 k, __m128d a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcastf64x2

Broadcast the 2 packed double-precision (64-bit) floating-point elements from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_broadcast_f64x2

__m512d _mm512_broadcast_f64x2(__m128d a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcastf64x2

Broadcast the 2 packed double-precision (64-bit) floating-point elements from a to all elements of the return value.



_mm512_mask_broadcast_f64x2

__m512d _mm512_mask_broadcast_f64x2(__m512d src, __mmask8 k, __m128d a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcastf64x2

Broadcast the 2 packed double-precision (64-bit) floating-point elements from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_broadcast_f64x2

__m512d _mm512_maskz_broadcast_f64x2(__mmask8 k, __m128d a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcastf64x2

Broadcast the 2 packed double-precision (64-bit) floating-point elements from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_broadcastsd_pd

__m256d _mm256_mask_broadcastsd_pd(__m256d src, __mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vbroadcastsd

Broadcast the low double-precision (64-bit) floating-point element from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_broadcastsd_pd

__m256d _mm256_maskz_broadcastsd_pd(__mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vbroadcastsd

Broadcast the low double-precision (64-bit) floating-point element from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_broadcastss_ps

__m128 _mm_mask_broadcastss_ps(__m128 src, __mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vbroadcastss

Broadcast the low single-precision (32-bit) floating-point element from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_broadcastss_ps

__m128 _mm_maskz_broadcastss_ps(__mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vbroadcastss

Broadcast the low single-precision (32-bit) floating-point element from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_broadcastss_ps

__m256 _mm256_mask_broadcastss_ps(__m256 src, __mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vbroadcastss

Broadcast the low single-precision (32-bit) floating-point element from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_broadcastss_ps

__m256 _mm256_maskz_broadcastss_ps(__mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vbroadcastss

Broadcast the low single-precision (32-bit) floating-point element from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_compress_pd

__m128d _mm_mask_compress_pd(__m128d src, __mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vcompresspd

Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to the return value, and pass through the remaining elements from src.



_mm_maskz_compress_pd

__m128d _mm_maskz_compress_pd(__mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vcompresspd

Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in zeromask k) to the return value, and set the remaining elements to zero.



_mm256_mask_compress_pd

__m256d _mm256_mask_compress_pd(__m256d src, __mmask8 k, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vcompresspd

Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to the return value, and pass through the remaining elements from src.



_mm256_maskz_compress_pd

__m256d _mm256_maskz_compress_pd(__mmask8 k, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vcompresspd

Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in zeromask k) to the return value, and set the remaining elements to zero.



_mm_mask_compress_ps

__m128 _mm_mask_compress_ps(__m128 src, __mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vcompressps

Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in writemask k) to the return value, and pass through the remaining elements from src.



_mm_maskz_compress_ps

__m128 _mm_maskz_compress_ps(__mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vcompressps

Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in zeromask k) to the return value, and set the remaining elements to zero.



_mm256_mask_compress_ps

__m256 _mm256_mask_compress_ps(__m256 src, __mmask8 k, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vcompressps

Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in writemask k) to the return value, and pass through the remaining elements from src.



_mm256_maskz_compress_ps

__m256 _mm256_maskz_compress_ps(__mmask8 k, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vcompressps

Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in zeromask k) to the return value, and set the remaining elements to zero.



_mm_mask_expand_pd

__m128d _mm_mask_expand_pd(__m128d src, __mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vexpandpd

Load contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_expand_pd

__m128d _mm_maskz_expand_pd(__mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vexpandpd

Load contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_expand_pd

__m256d _mm256_mask_expand_pd(__m256d src, __mmask8 k, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vexpandpd

Load contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_expand_pd

__m256d _mm256_maskz_expand_pd(__mmask8 k, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vexpandpd

Load contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_expand_ps

__m128 _mm_mask_expand_ps(__m128 src, __mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vexpandps

Load contiguous active single-precision (32-bit) floating-point elements from a (those with their respective bit set in mask k), and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_expand_ps

__m128 _mm_maskz_expand_ps(__mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vexpandps

Load contiguous active single-precision (32-bit) floating-point elements from a (those with their respective bit set in mask k), and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_expand_ps

__m256 _mm256_mask_expand_ps(__m256 src, __mmask8 k, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vexpandps

Load contiguous active single-precision (32-bit) floating-point elements from a (those with their respective bit set in mask k), and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_expand_ps

__m256 _mm256_maskz_expand_ps(__mmask8 k, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vexpandps

Load contiguous active single-precision (32-bit) floating-point elements from a (those with their respective bit set in mask k), and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_extractf32x4_ps

__m128 _mm256_extractf32x4_ps(__m256 a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vextractf32x4

Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm, and store the result in the return value.



_mm256_mask_extractf32x4_ps

__m128 _mm256_mask_extractf32x4_ps(__m128 src, __mmask8 k, __m256 a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vextractf32x4

Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_extractf32x4_ps

__m128 _mm256_maskz_extractf32x4_ps(__mmask8 k, __m256 a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vextractf32x4

Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_extractf32x8_ps

__m256 _mm512_extractf32x8_ps(__m512 a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vextractf32x8

Extract 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a, selected with imm, and store the result in the return value.



_mm512_mask_extractf32x8_ps

__m256 _mm512_mask_extractf32x8_ps(__m256 src, __mmask8 k, __m512 a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vextractf32x8

Extract 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a, selected with imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_extractf32x8_ps

__m256 _mm512_maskz_extractf32x8_ps(__mmask8 k, __m512 a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vextractf32x8

Extract 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a, selected with imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_extractf64x2_pd

__m128d _mm256_extractf64x2_pd(__m256d a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vextractf64x2

Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with imm, and store the result in the return value.



_mm256_mask_extractf64x2_pd

__m128d _mm256_mask_extractf64x2_pd(__m128d src, __mmask8 k, __m256d a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vextractf64x2

Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_extractf64x2_pd

__m128d _mm256_maskz_extractf64x2_pd(__mmask8 k, __m256d a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vextractf64x2

Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_extractf64x2_pd

__m128d _mm512_extractf64x2_pd(__m512d a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vextractf64x2

Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with imm, and store the result in the return value.



_mm512_mask_extractf64x2_pd

__m128d _mm512_mask_extractf64x2_pd(__m128d src, __mmask8 k, __m512d a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vextractf64x2

Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_extractf64x2_pd

__m128d _mm512_maskz_extractf64x2_pd(__mmask8 k, __m512d a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vextractf64x2

Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_fixupimm_pd

__m128d _mm_fixupimm_pd(__m128d a, __m128d b, __m128i c, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfixupimmpd

Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and return the results. imm is used to set the required flags reporting.



_mm_mask_fixupimm_pd

__m128d _mm_mask_fixupimm_pd(__m128d a, __mmask8 k, __m128d b, __m128i c, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfixupimmpd

Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set). imm is used to set the required flags reporting.



_mm_maskz_fixupimm_pd

__m128d _mm_maskz_fixupimm_pd(__mmask8 k, __m128d a, __m128d b, __m128i c, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfixupimmpd

Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm is used to set the required flags reporting.



_mm256_fixupimm_pd

__m256d _mm256_fixupimm_pd(__m256d a, __m256d b, __m256i c, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfixupimmpd

Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and return the results. imm is used to set the required flags reporting.



_mm256_mask_fixupimm_pd

__m256d _mm256_mask_fixupimm_pd(__m256d a, __mmask8 k, __m256d b, __m256i c, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfixupimmpd

Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set). imm is used to set the required flags reporting.



_mm256_maskz_fixupimm_pd

__m256d _mm256_maskz_fixupimm_pd(__mmask8 k, __m256d a, __m256d b, __m256i c, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfixupimmpd

Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm is used to set the required flags reporting.



_mm_fixupimm_ps

__m128 _mm_fixupimm_ps(__m128 a, __m128 b, __m128i c, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfixupimmps

Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and return the results. imm is used to set the required flags reporting.



_mm_mask_fixupimm_ps

__m128 _mm_mask_fixupimm_ps(__m128 a, __mmask8 k, __m128 b, __m128i c, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfixupimmps

Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set). imm is used to set the required flags reporting.



_mm_maskz_fixupimm_ps

__m128 _mm_maskz_fixupimm_ps(__mmask8 k, __m128 a, __m128 b, __m128i c, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfixupimmps

Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm is used to set the required flags reporting.



_mm256_fixupimm_ps

__m256 _mm256_fixupimm_ps(__m256 a, __m256 b, __m256i c, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfixupimmps

Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and return the results. imm is used to set the required flags reporting.



_mm256_mask_fixupimm_ps

__m256 _mm256_mask_fixupimm_ps(__m256 a, __mmask8 k, __m256 b, __m256i c, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfixupimmps

Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set). imm is used to set the required flags reporting.



_mm256_maskz_fixupimm_ps

__m256 _mm256_maskz_fixupimm_ps(__mmask8 k, __m256 a, __m256 b, __m256i c, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfixupimmps

Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm is used to set the required flags reporting.



_mm_getexp_pd

__m128d _mm_getexp_pd(__m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetexppd

Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and return the results. This intrinsic essentially calculates floor(log2(x)) for each element.



_mm_mask_getexp_pd

__m128d _mm_mask_getexp_pd(__m128d src, __mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetexppd

Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.



_mm_maskz_getexp_pd

__m128d _mm_maskz_getexp_pd(__mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetexppd

Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.



_mm256_getexp_pd

__m256d _mm256_getexp_pd(__m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetexppd

Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and return the results. This intrinsic essentially calculates floor(log2(x)) for each element.



_mm256_mask_getexp_pd

__m256d _mm256_mask_getexp_pd(__m256d src, __mmask8 k, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetexppd

Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.



_mm256_maskz_getexp_pd

__m256d _mm256_maskz_getexp_pd(__mmask8 k, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetexppd

Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.



_mm_getexp_ps

__m128 _mm_getexp_ps(__m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetexpps

Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and return the results. This intrinsic essentially calculates floor(log2(x)) for each element.



_mm_mask_getexp_ps

__m128 _mm_mask_getexp_ps(__m128 src, __mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetexpps

Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.



_mm_maskz_getexp_ps

__m128 _mm_maskz_getexp_ps(__mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetexpps

Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.



_mm256_getexp_ps

__m256 _mm256_getexp_ps(__m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetexpps

Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and return the results. This intrinsic essentially calculates floor(log2(x)) for each element.



_mm256_mask_getexp_ps

__m256 _mm256_mask_getexp_ps(__m256 src, __mmask8 k, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetexpps

Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.



_mm256_maskz_getexp_ps

__m256 _mm256_maskz_getexp_ps(__mmask8 k, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetexpps

Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.



_mm_getmant_pd

__m128d _mm_getmant_pd(__m128d a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetmantpd

Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and return the results. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.



_mm_mask_getmant_pd

__m128d _mm_mask_getmant_pd(__m128d src, __mmask8 k, __m128d a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetmantpd

Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.



_mm_maskz_getmant_pd

__m128d _mm_maskz_getmant_pd(__mmask8 k, __m128d a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetmantpd

Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.



_mm256_getmant_pd

__m256d _mm256_getmant_pd(__m256d a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetmantpd

Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and return the results. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.



_mm256_mask_getmant_pd

__m256d _mm256_mask_getmant_pd(__m256d src, __mmask8 k, __m256d a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetmantpd

Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.



_mm256_maskz_getmant_pd

__m256d _mm256_maskz_getmant_pd(__mmask8 k, __m256d a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetmantpd

Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.



_mm_getmant_ps

__m128 _mm_getmant_ps(__m128 a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetmantps

Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and return the results. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.



_mm_mask_getmant_ps

__m128 _mm_mask_getmant_ps(__m128 src, __mmask8 k, __m128 a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetmantps

Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.



_mm_maskz_getmant_ps

__m128 _mm_maskz_getmant_ps(__mmask8 k, __m128 a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetmantps

Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.



_mm256_getmant_ps

__m256 _mm256_getmant_ps(__m256 a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetmantps

Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and return the results. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.



_mm256_mask_getmant_ps

__m256 _mm256_mask_getmant_ps(__m256 src, __mmask8 k, __m256 a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetmantps

Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.



_mm256_maskz_getmant_ps

__m256 _mm256_maskz_getmant_ps(__mmask8 k, __m256 a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetmantps

Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.



_mm256_insertf32x4

__m256 _mm256_insertf32x4(__m256 a, __m128 b, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vinsertf32x4

Copy a to the return value, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into dst at the location specified by imm.



_mm256_mask_insertf32x4

__m256 _mm256_mask_insertf32x4(__m256 src, __mmask8 k, __m256 a, __m128 b, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vinsertf32x4

Copy a to tmp, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into tmp at the location specified by imm. Store tmp to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_insertf32x4

__m256 _mm256_maskz_insertf32x4(__mmask8 k, __m256 a, __m128 b, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vinsertf32x4

Copy a to tmp, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into tmp at the location specified by imm. Store tmp to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_insertf32x8

__m512 _mm512_insertf32x8(__m512 a, __m256 b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vinsertf32x8

Copy a to the return value, then insert 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from b into dst at the location specified by imm.



_mm512_mask_insertf32x8

__m512 _mm512_mask_insertf32x8(__m512 src, __mmask16 k, __m512 a, __m256 b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vinsertf32x8

Copy a to tmp, then insert 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from b into tmp at the location specified by imm. Store tmp to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_insertf32x8

__m512 _mm512_maskz_insertf32x8(__mmask16 k, __m512 a, __m256 b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vinsertf32x8

Copy a to tmp, then insert 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from b into tmp at the location specified by imm. Store tmp to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_insertf64x2

__m256d _mm256_insertf64x2(__m256d a, __m128d b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vinsertf64x2

Copy a to the return value, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into dst at the location specified by imm.



_mm256_mask_insertf64x2

__m256d _mm256_mask_insertf64x2(__m256d src, __mmask8 k, __m256d a, __m128d b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vinsertf64x2

Copy a to tmp, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into tmp at the location specified by imm. Store tmp to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_insertf64x2

__m256d _mm256_maskz_insertf64x2(__mmask8 k, __m256d a, __m128d b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vinsertf64x2

Copy a to tmp, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into tmp at the location specified by imm. Store tmp to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_insertf64x2

__m512d _mm512_insertf64x2(__m512d a, __m128d b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vinsertf64x2

Copy a to the return value, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into dst at the location specified by imm.



_mm512_mask_insertf64x2

__m512d _mm512_mask_insertf64x2(__m512d src, __mmask8 k, __m512d a, __m128d b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vinsertf64x2

Copy a to tmp, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into tmp at the location specified by imm. Store tmp to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_insertf64x2

__m512d _mm512_maskz_insertf64x2(__mmask8 k, __m512d a, __m128d b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vinsertf64x2

Copy a to tmp, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into tmp at the location specified by imm. Store tmp to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask2_permutex2var_pd

__m128d _mm_mask2_permutex2var_pd(__m128d a, __m128i idx, __mmask8 k, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2pd

Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set)



_mm256_mask2_permutex2var_pd

__m256d _mm256_mask2_permutex2var_pd(__m256d a, __m256i idx, __mmask8 k, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2pd

Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).



_mm_maskz_permutex2var_pd

__m128d _mm_maskz_permutex2var_pd(__mmask8 k, __m128d a, __m128i idx, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2pd, vpermt2pd

Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_permutex2var_pd

__m128d _mm_permutex2var_pd(__m128d a, __m128i idx, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2pd, vpermt2pd

Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results.



_mm256_maskz_permutex2var_pd

__m256d _mm256_maskz_permutex2var_pd(__mmask8 k, __m256d a, __m256i idx, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2pd, vpermt2pd

Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_permutex2var_pd

__m256d _mm256_permutex2var_pd(__m256d a, __m256i idx, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2pd, vpermt2pd

Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results.



_mm_mask2_permutex2var_ps

__m128 _mm_mask2_permutex2var_ps(__m128 a, __m128i idx, __mmask8 k, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2ps

Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).



_mm256_mask2_permutex2var_ps

__m256 _mm256_mask2_permutex2var_ps(__m256 a, __m256i idx, __mmask8 k, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2ps

Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).



_mm_maskz_permutex2var_ps

__m128 _mm_maskz_permutex2var_ps(__mmask8 k, __m128 a, __m128i idx, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2ps, vpermt2ps

Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_permutex2var_ps

__m128 _mm_permutex2var_ps(__m128 a, __m128i idx, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2ps, vpermt2ps

Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results.



_mm256_maskz_permutex2var_ps

__m256 _mm256_maskz_permutex2var_ps(__mmask8 k, __m256 a, __m256i idx, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2ps, vpermt2ps

Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_permutex2var_ps

__m256 _mm256_permutex2var_ps(__m256 a, __m256i idx, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2ps, vpermt2ps

Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results.



_mm_mask_permute_pd

__m128d _mm_mask_permute_pd(__m128d src, __mmask8 k, __m128d a, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilpd

Shuffle double-precision (64-bit) floating-point elements in a using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_mask_permutevar_pd

__m128d _mm_mask_permutevar_pd(__m128d src, __mmask8 k, __m128d a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilpd

Shuffle double-precision (64-bit) floating-point elements in a using the control in b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_permute_pd

__m128d _mm_maskz_permute_pd(__mmask8 k, __m128d a, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilpd

Shuffle double-precision (64-bit) floating-point elements in a using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_maskz_permutevar_pd

__m128d _mm_maskz_permutevar_pd(__mmask8 k, __m128d a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilpd

Shuffle double-precision (64-bit) floating-point elements in a using the control in b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_permute_pd

__m256d _mm256_mask_permute_pd(__m256d src, __mmask8 k, __m256d a, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilpd

Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_mask_permutevar_pd

__m256d _mm256_mask_permutevar_pd(__m256d src, __mmask8 k, __m256d a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilpd

Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_permute_pd

__m256d _mm256_maskz_permute_pd(__mmask8 k, __m256d a, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilpd

Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_maskz_permutevar_pd

__m256d _mm256_maskz_permutevar_pd(__mmask8 k, __m256d a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilpd

Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_permute_ps

__m128 _mm_mask_permute_ps(__m128 src, __mmask8 k, __m128 a, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilps

Shuffle single-precision (32-bit) floating-point elements in a using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_mask_permutevar_ps

__m128 _mm_mask_permutevar_ps(__m128 src, __mmask8 k, __m128 a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilps

Shuffle single-precision (32-bit) floating-point elements in a using the control in b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_permute_ps

__m128 _mm_maskz_permute_ps(__mmask8 k, __m128 a, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilps

Shuffle single-precision (32-bit) floating-point elements in a using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_maskz_permutevar_ps

__m128 _mm_maskz_permutevar_ps(__mmask8 k, __m128 a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilps

Shuffle single-precision (32-bit) floating-point elements in a using the control in b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_permute_ps

__m256 _mm256_mask_permute_ps(__m256 src, __mmask8 k, __m256 a, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilps

Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_mask_permutevar_ps

__m256 _mm256_mask_permutevar_ps(__m256 src, __mmask8 k, __m256 a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilps

Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_permute_ps

__m256 _mm256_maskz_permute_ps(__mmask8 k, __m256 a, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilps

Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_maskz_permutevar_ps

__m256 _mm256_maskz_permutevar_ps(__mmask8 k, __m256 a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilps

Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_permutex_pd

__m256d _mm256_mask_permutex_pd(__m256d src, __mmask8 k, __m256d a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermpd

Shuffle double-precision (64-bit) floating-point elements in a across lanes using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_mask_permutexvar_pd

__m256d _mm256_mask_permutexvar_pd(__m256d src, __mmask8 k, __m256i idx, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermpd

Shuffle double-precision (64-bit) floating-point elements in a across lanes using the corresponding index in idx, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_permutex_pd

__m256d _mm256_maskz_permutex_pd(__mmask8 k, __m256d a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermpd

Shuffle double-precision (64-bit) floating-point elements in a across lanes using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_maskz_permutexvar_pd

__m256d _mm256_maskz_permutexvar_pd(__mmask8 k, __m256i idx, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermpd

Shuffle double-precision (64-bit) floating-point elements in a across lanes using the corresponding index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_permutex_pd

__m256d _mm256_permutex_pd(__m256d a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermpd

Shuffle double-precision (64-bit) floating-point elements in a across lanes using the control in imm, and return the results.



_mm256_permutexvar_pd

__m256d _mm256_permutexvar_pd(__m256i idx, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermpd

Shuffle double-precision (64-bit) floating-point elements in a across lanes using the corresponding index in idx, and return the results.



_mm256_mask_permutexvar_ps

__m256 _mm256_mask_permutexvar_ps(__m256 src, __mmask8 k, __m256i idx, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermps

Shuffle single-precision (32-bit) floating-point elements in a across lanes using the corresponding index in idx, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_permutexvar_ps

__m256 _mm256_maskz_permutexvar_ps(__mmask8 k, __m256i idx, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermps

Shuffle single-precision (32-bit) floating-point elements in a across lanes using the corresponding index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_permutexvar_ps

__m256 _mm256_permutexvar_ps(__m256i idx, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermps

Shuffle single-precision (32-bit) floating-point elements in a across lanes using the corresponding index in idx.



_mm_mask_permutex2var_pd

__m128d _mm_mask_permutex2var_pd(__m128d a, __mmask8 k, __m128i idx, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermt2pd

Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_mask_permutex2var_pd

__m256d _mm256_mask_permutex2var_pd(__m256d a, __mmask8 k, __m256i idx, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermt2pd

Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_mask_permutex2var_ps

__m128 _mm_mask_permutex2var_ps(__m128 a, __mmask8 k, __m128i idx, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermt2ps

Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_mask_permutex2var_ps

__m256 _mm256_mask_permutex2var_ps(__m256 a, __mmask8 k, __m256i idx, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermt2ps

Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_mask_range_pd

__m128d _mm_mask_range_pd(__m128d src, __mmask8 k, __m128d a, __m128d b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vrangepd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_range_pd

__m128d _mm_maskz_range_pd(__mmask8 k, __m128d a, __m128d b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vrangepd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_range_pd

__m128d _mm_range_pd(__m128d a, __m128d b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vrangepd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results.



_mm256_mask_range_pd

__m256d _mm256_mask_range_pd(__m256d src, __mmask8 k, __m256d a, __m256d b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vrangepd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_range_pd

__m256d _mm256_maskz_range_pd(__mmask8 k, __m256d a, __m256d b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vrangepd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_range_pd

__m256d _mm256_range_pd(__m256d a, __m256d b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vrangepd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results.



_mm512_mask_range_pd

__m512d _mm512_mask_range_pd(__m512d src, __mmask8 k, __m512d a, __m512d b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vrangepd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_mask_range_round_pd

__m512d _mm512_mask_range_round_pd(__m512d src, __mmask8 k, __m512d a, __m512d b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vrangepd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_range_pd

__m512d _mm512_maskz_range_pd(__mmask8 k, __m512d a, __m512d b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vrangepd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_maskz_range_round_pd

__m512d _mm512_maskz_range_round_pd(__mmask8 k, __m512d a, __m512d b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vrangepd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_range_pd

__m512d _mm512_range_pd(__m512d a, __m512d b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vrangepd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results.



_mm512_range_round_pd

__m512d _mm512_range_round_pd(__m512d a, __m512d b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vrangepd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results.



_mm_mask_range_ps

__m128 _mm_mask_range_ps(__m128 src, __mmask8 k, __m128 a, __m128 b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vrangeps

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_range_ps

__m128 _mm_maskz_range_ps(__mmask8 k, __m128 a, __m128 b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vrangeps

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_range_ps

__m128 _mm_range_ps(__m128 a, __m128 b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vrangeps

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results.



_mm256_mask_range_ps

__m256 _mm256_mask_range_ps(__m256 src, __mmask8 k, __m256 a, __m256 b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vrangeps

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_range_ps

__m256 _mm256_maskz_range_ps(__mmask8 k, __m256 a, __m256 b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vrangeps

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_range_ps

__m256 _mm256_range_ps(__m256 a, __m256 b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vrangeps

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results.



_mm512_mask_range_ps

__m512 _mm512_mask_range_ps(__m512 src, __mmask16 k, __m512 a, __m512 b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vrangeps

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_mask_range_round_ps

__m512 _mm512_mask_range_round_ps(__m512 src, __mmask16 k, __m512 a, __m512 b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vrangeps

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_range_ps

__m512 _mm512_maskz_range_ps(__mmask16 k, __m512 a, __m512 b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vrangeps

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_maskz_range_round_ps

__m512 _mm512_maskz_range_round_ps(__mmask16 k, __m512 a, __m512 b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vrangeps

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_range_ps

__m512 _mm512_range_ps(__m512 a, __m512 b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vrangeps

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results.



_mm512_range_round_ps

__m512 _mm512_range_round_ps(__m512 a, __m512 b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vrangeps

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results.



_mm_mask_range_round_sd

__m128d _mm_mask_range_round_sd(__m128d src, __mmask8 k, __m128d a, __m128d b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vrangesd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of the return value using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.



_mm_mask_range_sd

__m128d _mm_mask_range_sd(__m128d src, __mmask8 k, __m128d a, __m128d b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vrangesd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of the return value using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.



_mm_maskz_range_round_sd

__m128d _mm_maskz_range_round_sd(__mmask8 k, __m128d a, __m128d b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vrangesd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of the return value using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.



_mm_maskz_range_sd

__m128d _mm_maskz_range_sd(__mmask8 k, __m128d a, __m128d b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vrangesd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of the return value using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.



_mm_range_round_sd

__m128d _mm_range_round_sd(__m128d a, __m128d b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vrangesd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of the return value, and copy the upper element from a to the upper element of dst.



_mm_mask_range_round_ss

__m128 _mm_mask_range_round_ss(__m128 src, __mmask8 k, __m128 a, __m128 b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vrangess

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of the return value using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.



_mm_mask_range_ss

__m128 _mm_mask_range_ss(__m128 src, __mmask8 k, __m128 a, __m128 b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vrangess

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of the return value using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.



_mm_maskz_range_round_ss

__m128 _mm_maskz_range_round_ss(__mmask8 k, __m128 a, __m128 b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vrangess

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of the return value using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.



_mm_maskz_range_ss

__m128 _mm_maskz_range_ss(__mmask8 k, __m128 a, __m128 b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vrangess

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of the return value using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.



_mm_range_round_ss

__m128 _mm_range_round_ss(__m128 a, __m128 b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vrangess

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of the return value, and copy the upper 3 packed elements from a to the upper elements of dst.



_mm_mask_reduce_pd

__m128d _mm_mask_reduce_pd(__m128d src, __mmask8 k, __m128d a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vreducepd

Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_reduce_pd

__m128d _mm_maskz_reduce_pd(__mmask8 k, __m128d a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vreducepd

Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_reduce_pd

__m128d _mm_reduce_pd(__m128d a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vreducepd

Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results.



_mm256_mask_reduce_pd

__m256d _mm256_mask_reduce_pd(__m256d src, __mmask8 k, __m256d a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vreducepd

Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_reduce_pd

__m256d _mm256_maskz_reduce_pd(__mmask8 k, __m256d a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vreducepd

Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_reduce_pd

__m256d _mm256_reduce_pd(__m256d a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vreducepd

Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results.



_mm512_mask_reduce_pd

__m512d _mm512_mask_reduce_pd(__m512d src, __mmask8 k, __m512d a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vreducepd

Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_mask_reduce_round_pd

__m512d _mm512_mask_reduce_round_pd(__m512d src, __mmask8 k, __m512d a, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vreducepd

Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_reduce_pd

__m512d _mm512_maskz_reduce_pd(__mmask8 k, __m512d a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vreducepd

Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_maskz_reduce_round_pd

__m512d _mm512_maskz_reduce_round_pd(__mmask8 k, __m512d a, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vreducepd

Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_reduce_pd

__m512d _mm512_reduce_pd(__m512d a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vreducepd

Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results.



_mm512_reduce_round_pd

__m512d _mm512_reduce_round_pd(__m512d a, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vreducepd

Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results.



_mm_mask_reduce_ps

__m128 _mm_mask_reduce_ps(__m128 src, __mmask8 k, __m128 a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vreduceps

Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_reduce_ps

__m128 _mm_maskz_reduce_ps(__mmask8 k, __m128 a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vreduceps

Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_reduce_ps

__m128 _mm_reduce_ps(__m128 a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vreduceps

Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results.



_mm256_mask_reduce_ps

__m256 _mm256_mask_reduce_ps(__m256 src, __mmask8 k, __m256 a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vreduceps

Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_reduce_ps

__m256 _mm256_maskz_reduce_ps(__mmask8 k, __m256 a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vreduceps

Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_reduce_ps

__m256 _mm256_reduce_ps(__m256 a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vreduceps

Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results.



_mm512_mask_reduce_ps

__m512 _mm512_mask_reduce_ps(__m512 src, __mmask16 k, __m512 a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vreduceps

Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_mask_reduce_round_ps

__m512 _mm512_mask_reduce_round_ps(__m512 src, __mmask16 k, __m512 a, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vreduceps

Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_reduce_ps

__m512 _mm512_maskz_reduce_ps(__mmask16 k, __m512 a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vreduceps

Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_maskz_reduce_round_ps

__m512 _mm512_maskz_reduce_round_ps(__mmask16 k, __m512 a, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vreduceps

Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_reduce_ps

__m512 _mm512_reduce_ps(__m512 a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vreduceps

Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results.



_mm512_reduce_round_ps

__m512 _mm512_reduce_round_ps(__m512 a, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vreduceps

Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results.



_mm_mask_reduce_round_sd

__m128d _mm_mask_reduce_round_sd(__m128d src, __mmask8 k, __m128d a, __m128d b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vreducesd

Extract the reduced argument of the lower double-precision (64-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from b to the upper element of dst.



_mm_mask_reduce_sd

__m128d _mm_mask_reduce_sd(__m128d src, __mmask8 k, __m128d a, __m128d b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vreducesd

Extract the reduced argument of the lower double-precision (64-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from b to the upper element of dst.



_mm_maskz_reduce_round_sd

__m128d _mm_maskz_reduce_round_sd(__mmask8 k, __m128d a, __m128d b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vreducesd

Extract the reduced argument of the lower double-precision (64-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from b to the upper element of dst.



_mm_maskz_reduce_sd

__m128d _mm_maskz_reduce_sd(__mmask8 k, __m128d a, __m128d b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vreducesd

Extract the reduced argument of the lower double-precision (64-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from b to the upper element of dst.



_mm_reduce_round_sd

__m128d _mm_reduce_round_sd(__m128d a, __m128d b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vreducesd

Extract the reduced argument of the lower double-precision (64-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value, and copy the upper element from b to the upper element of dst.



_mm_reduce_sd

__m128d _mm_reduce_sd(__m128d a, __m128d b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vreducesd

Extract the reduced argument of the lower double-precision (64-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value, and copy the upper element from b to the upper element of dst.



_mm_mask_reduce_round_ss

__m128 _mm_mask_reduce_round_ss(__m128 src, __mmask8 k, __m128 a, __m128 b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vreducess

Extract the reduced argument of the lower single-precision (32-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from b to the upper elements of dst.



_mm_mask_reduce_ss

__m128 _mm_mask_reduce_ss(__m128 src, __mmask8 k, __m128 a, __m128 b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vreducess

Extract the reduced argument of the lower single-precision (32-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from b to the upper elements of dst.



_mm_maskz_reduce_round_ss

__m128 _mm_maskz_reduce_round_ss(__mmask8 k, __m128 a, __m128 b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vreducess

Extract the reduced argument of the lower single-precision (32-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from b to the upper elements of dst.



_mm_maskz_reduce_ss

__m128 _mm_maskz_reduce_ss(__mmask8 k, __m128 a, __m128 b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vreducess

Extract the reduced argument of the lower single-precision (32-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from b to the upper elements of dst.



_mm_reduce_round_ss

__m128 _mm_reduce_round_ss(__m128 a, __m128 b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vreducess

Extract the reduced argument of the lower single-precision (32-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value, and copy the upper 3 packed elements from b to the upper elements of dst.



_mm_reduce_ss

__m128 _mm_reduce_ss(__m128 a, __m128 b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vreducess

Extract the reduced argument of the lower single-precision (32-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value, and copy the upper 3 packed elements from b to the upper elements of dst.



_mm_mask_roundscale_pd

__m128d _mm_mask_roundscale_pd(__m128d src, __mmask8 k, __m128d a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrndscalepd

Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_roundscale_pd

__m128d _mm_maskz_roundscale_pd(__mmask8 k, __m128d a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrndscalepd

Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_roundscale_pd

__m128d _mm_roundscale_pd(__m128d a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrndscalepd

Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results.



_mm256_mask_roundscale_pd

__m256d _mm256_mask_roundscale_pd(__m256d src, __mmask8 k, __m256d a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrndscalepd

Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_roundscale_pd

__m256d _mm256_maskz_roundscale_pd(__mmask8 k, __m256d a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrndscalepd

Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_roundscale_pd

__m256d _mm256_roundscale_pd(__m256d a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrndscalepd

Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results.



_mm_mask_roundscale_ps

__m128 _mm_mask_roundscale_ps(__m128 src, __mmask8 k, __m128 a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrndscaleps

Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_roundscale_ps

__m128 _mm_maskz_roundscale_ps(__mmask8 k, __m128 a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrndscaleps

Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_roundscale_ps

__m128 _mm_roundscale_ps(__m128 a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrndscaleps

Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results.



_mm256_mask_roundscale_ps

__m256 _mm256_mask_roundscale_ps(__m256 src, __mmask8 k, __m256 a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrndscaleps

Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_roundscale_ps

__m256 _mm256_maskz_roundscale_ps(__mmask8 k, __m256 a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrndscaleps

Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_roundscale_ps

__m256 _mm256_roundscale_ps(__m256 a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrndscaleps

Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results.



_mm_mask_scalef_pd

__m128d _mm_mask_scalef_pd(__m128d src, __mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscalefpd

Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_scalef_pd

__m128d _mm_maskz_scalef_pd(__mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscalefpd

Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_scalef_pd

__m128d _mm_scalef_pd(__m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscalefpd

Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and return the results.



_mm256_mask_scalef_pd

__m256d _mm256_mask_scalef_pd(__m256d src, __mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscalefpd

Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_scalef_pd

__m256d _mm256_maskz_scalef_pd(__mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscalefpd

Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_scalef_pd

__m256d _mm256_scalef_pd(__m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscalefpd

Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and return the results.



_mm_mask_scalef_ps

__m128 _mm_mask_scalef_ps(__m128 src, __mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscalefps

Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_scalef_ps

__m128 _mm_maskz_scalef_ps(__mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscalefps

Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_scalef_ps

__m128 _mm_scalef_ps(__m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscalefps

Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and return the results.



_mm256_mask_scalef_ps

__m256 _mm256_mask_scalef_ps(__m256 src, __mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscalefps

Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_scalef_ps

__m256 _mm256_maskz_scalef_ps(__mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscalefps

Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_scalef_ps

__m256 _mm256_scalef_ps(__m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscalefps

Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and return the results.



_mm256_mask_shuffle_f32x4

__m256 _mm256_mask_shuffle_f32x4(__m256 src, __mmask8 k, __m256 a, __m256 b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshuff32x4

Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by imm from a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_shuffle_f32x4

__m256 _mm256_maskz_shuffle_f32x4(__mmask8 k, __m256 a, __m256 b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshuff32x4

Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by imm from a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_shuffle_f32x4

__m256 _mm256_shuffle_f32x4(__m256 a, __m256 b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshuff32x4

Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by imm from a and b, and return the results.



_mm256_mask_shuffle_f64x2

__m256d _mm256_mask_shuffle_f64x2(__m256d src, __mmask8 k, __m256d a, __m256d b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshuff64x2

Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by imm from a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_shuffle_f64x2

__m256d _mm256_maskz_shuffle_f64x2(__mmask8 k, __m256d a, __m256d b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshuff64x2

Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by imm from a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_shuffle_f64x2

__m256d _mm256_shuffle_f64x2(__m256d a, __m256d b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshuff64x2

Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by imm from a and b, and return the results.



_mm_mask_shuffle_pd

__m128d _mm_mask_shuffle_pd(__m128d src, __mmask8 k, __m128d a, __m128d b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufpd

Shuffle double-precision (64-bit) floating-point elements using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_shuffle_pd

__m128d _mm_maskz_shuffle_pd(__mmask8 k, __m128d a, __m128d b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufpd

Shuffle double-precision (64-bit) floating-point elements using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_shuffle_pd

__m256d _mm256_mask_shuffle_pd(__m256d src, __mmask8 k, __m256d a, __m256d b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufpd

Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_shuffle_pd

__m256d _mm256_maskz_shuffle_pd(__mmask8 k, __m256d a, __m256d b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufpd

Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_shuffle_ps

__m128 _mm_mask_shuffle_ps(__m128 src, __mmask8 k, __m128 a, __m128 b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufps

Shuffle single-precision (32-bit) floating-point elements in a using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_shuffle_ps

__m128 _mm_maskz_shuffle_ps(__mmask8 k, __m128 a, __m128 b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufps

Shuffle single-precision (32-bit) floating-point elements in a using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_shuffle_ps

__m256 _mm256_mask_shuffle_ps(__m256 src, __mmask8 k, __m256 a, __m256 b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufps

Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_shuffle_ps

__m256 _mm256_maskz_shuffle_ps(__mmask8 k, __m256 a, __m256 b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufps

Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_unpackhi_pd

__m128d _mm_mask_unpackhi_pd(__m128d src, __mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpckhpd

Unpack and interleave double-precision (64-bit) floating-point elements from the high half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_unpackhi_pd

__m128d _mm_maskz_unpackhi_pd(__mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpckhpd

Unpack and interleave double-precision (64-bit) floating-point elements from the high half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_unpackhi_pd

__m256d _mm256_mask_unpackhi_pd(__m256d src, __mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpckhpd

Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_unpackhi_pd

__m256d _mm256_maskz_unpackhi_pd(__mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpckhpd

Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_unpackhi_ps

__m128 _mm_mask_unpackhi_ps(__m128 src, __mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpckhps

Unpack and interleave single-precision (32-bit) floating-point elements from the high half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_unpackhi_ps

__m128 _mm_maskz_unpackhi_ps(__mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpckhps

Unpack and interleave single-precision (32-bit) floating-point elements from the high half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_unpackhi_ps

__m256 _mm256_mask_unpackhi_ps(__m256 src, __mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpckhps

Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_unpackhi_ps

__m256 _mm256_maskz_unpackhi_ps(__mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpckhps

Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_unpacklo_pd

__m128d _mm_mask_unpacklo_pd(__m128d src, __mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpcklpd

Unpack and interleave double-precision (64-bit) floating-point elements from the low half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_unpacklo_pd

__m128d _mm_maskz_unpacklo_pd(__mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpcklpd

Unpack and interleave double-precision (64-bit) floating-point elements from the low half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_unpacklo_pd

__m256d _mm256_mask_unpacklo_pd(__m256d src, __mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpcklpd

Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_unpacklo_pd

__m256d _mm256_maskz_unpacklo_pd(__mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpcklpd

Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_unpacklo_ps

__m128 _mm_mask_unpacklo_ps(__m128 src, __mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpcklps

Unpack and interleave single-precision (32-bit) floating-point elements from the low half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_unpacklo_ps

__m128 _mm_maskz_unpacklo_ps(__mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpcklps

Unpack and interleave single-precision (32-bit) floating-point elements from the low half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_unpacklo_ps

__m256 _mm256_mask_unpacklo_ps(__m256 src, __mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpcklps

Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_unpacklo_ps

__m256 _mm256_maskz_unpacklo_ps(__mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpcklps

Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_alignr_epi32

__m128i _mm_alignr_epi32(__m128i a, __m128i b, const int count)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): valignd

Concatenate a and b into a 32-byte immediate result, shift the result right by count 32-bit elements, and store the low 16 bytes (4 elements) in the return value.



_mm_mask_alignr_epi32

__m128i _mm_mask_alignr_epi32(__m128i src, __mmask8 k, __m128i a, __m128i b, const int count)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): valignd

Concatenate a and b into a 32-byte immediate result, shift the result right by count 32-bit elements, and store the low 16 bytes (4 elements) in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_alignr_epi32

__m128i _mm_maskz_alignr_epi32(__mmask8 k, __m128i a, __m128i b, const int count)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): valignd

Concatenate a and b into a 32-byte immediate result, shift the result right by count 32-bit elements, and store the low 16 bytes (4 elements) in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_alignr_epi32

__m256i _mm256_alignr_epi32(__m256i a, __m256i b, const int count)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): valignd

Concatenate a and b into a 64-byte immediate result, shift the result right by count 32-bit elements, and store the low 32 bytes (8 elements) in the return value.



_mm256_mask_alignr_epi32

__m256i _mm256_mask_alignr_epi32(__m256i src, __mmask8 k, __m256i a, __m256i b, const int count)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): valignd

Concatenate a and b into a 64-byte immediate result, shift the result right by count 32-bit elements, and store the low 32 bytes (8 elements) in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_alignr_epi32

__m256i _mm256_maskz_alignr_epi32(__mmask8 k, __m256i a, __m256i b, const int count)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): valignd

Concatenate a and b into a 64-byte immediate result, shift the result right by count 32-bit elements, and store the low 32 bytes (8 elements) in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_alignr_epi64

__m128i _mm_alignr_epi64(__m128i a, __m128i b, const int count)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): valignq

Concatenate a and b into a 32-byte immediate result, shift the result right by count 64-bit elements, and store the low 16 bytes (2 elements) in the return value.



_mm_mask_alignr_epi64

__m128i _mm_mask_alignr_epi64(__m128i src, __mmask8 k, __m128i a, __m128i b, const int count)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): valignq

Concatenate a and b into a 32-byte immediate result, shift the result right by count 64-bit elements, and store the low 16 bytes (2 elements) in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_alignr_epi64

__m128i _mm_maskz_alignr_epi64(__mmask8 k, __m128i a, __m128i b, const int count)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): valignq

Concatenate a and b into a 32-byte immediate result, shift the result right by count 64-bit elements, and store the low 16 bytes (2 elements) in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_alignr_epi64

__m256i _mm256_alignr_epi64(__m256i a, __m256i b, const int count)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): valignq

Concatenate a and b into a 64-byte immediate result, shift the result right by count 64-bit elements, and store the low 32 bytes (4 elements) in the return value.



_mm256_mask_alignr_epi64

__m256i _mm256_mask_alignr_epi64(__m256i src, __mmask8 k, __m256i a, __m256i b, const int count)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): valignq

Concatenate a and b into a 64-byte immediate result, shift the result right by count 64-bit elements, and store the low 32 bytes (4 elements) in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_alignr_epi64

__m256i _mm256_maskz_alignr_epi64(__mmask8 k, __m256i a, __m256i b, const int count)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): valignq

Concatenate a and b into a 64-byte immediate result, shift the result right by count 64-bit elements, and store the low 32 bytes (4 elements) in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_dbsad_epu8

__m128i _mm_dbsad_epu8(__m128i a, __m128i b, int imm)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vdbpsadbw

Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value.



_mm_mask_dbsad_epu8

__m128i _mm_mask_dbsad_epu8(__m128i src, __mmask8 k, __m128i a, __m128i b, int imm)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vdbpsadbw

Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_dbsad_epu8

__m128i _mm_maskz_dbsad_epu8(__mmask8 k, __m128i a, __m128i b, int imm)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vdbpsadbw

Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_dbsad_epu8

__m256i _mm256_dbsad_epu8(__m256i a, __m256i b, int imm)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vdbpsadbw

Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value.



_mm256_mask_dbsad_epu8

__m256i _mm256_mask_dbsad_epu8(__m256i src, __mmask16 k, __m256i a, __m256i b, int imm)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vdbpsadbw

Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_dbsad_epu8

__m256i _mm256_maskz_dbsad_epu8(__mmask16 k, __m256i a, __m256i b, int imm)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vdbpsadbw

Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_dbsad_epu8

__m512i _mm512_dbsad_epu8(__m512i a, __m512i b, int imm)

CPUID Flags: AVX512BW

Instruction(s): vdbpsadbw

Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value.



_mm512_mask_dbsad_epu8

__m512i _mm512_mask_dbsad_epu8(__m512i src, __mmask32 k, __m512i a, __m512i b, int imm)

CPUID Flags: AVX512BW

Instruction(s): vdbpsadbw

Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_dbsad_epu8

__m512i _mm512_maskz_dbsad_epu8(__mmask32 k, __m512i a, __m512i b, int imm)

CPUID Flags: AVX512BW

Instruction(s): vdbpsadbw

Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_extracti32x4_epi32

__m128i _mm256_extracti32x4_epi32(__m256i a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vextracti32x4

Extract 128 bits (composed of 4 packed 32-bit integers) from a, selected with imm, and store the result in the return value.



_mm256_mask_extracti32x4_epi32

__m128i _mm256_mask_extracti32x4_epi32(__m128i src, __mmask8 k, __m256i a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vextracti32x4

Extract 128 bits (composed of 4 packed 32-bit integers) from a, selected with imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_extracti32x4_epi32

__m128i _mm256_maskz_extracti32x4_epi32(__mmask8 k, __m256i a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vextracti32x4

Extract 128 bits (composed of 4 packed 32-bit integers) from a, selected with imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_extracti32x8_epi32

__m256i _mm512_extracti32x8_epi32(__m512i a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vextracti32x8

Extract 256 bits (composed of 8 packed 32-bit integers) from a, selected with imm, and store the result in the return value.



_mm512_mask_extracti32x8_epi32

__m256i _mm512_mask_extracti32x8_epi32(__m256i src, __mmask8 k, __m512i a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vextracti32x8

Extract 256 bits (composed of 8 packed 32-bit integers) from a, selected with imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_extracti32x8_epi32

__m256i _mm512_maskz_extracti32x8_epi32(__mmask8 k, __m512i a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vextracti32x8

Extract 256 bits (composed of 8 packed 32-bit integers) from a, selected with imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_extracti64x2_epi64

__m128i _mm256_extracti64x2_epi64(__m256i a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vextracti64x2

Extract 128 bits (composed of 2 packed 64-bit integers) from a, selected with imm, and store the result in the return value.



_mm256_mask_extracti64x2_epi64

__m128i _mm256_mask_extracti64x2_epi64(__m128i src, __mmask8 k, __m256i a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vextracti64x2

Extract 128 bits (composed of 2 packed 64-bit integers) from a, selected with imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_extracti64x2_epi64

__m128i _mm256_maskz_extracti64x2_epi64(__mmask8 k, __m256i a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vextracti64x2

Extract 128 bits (composed of 2 packed 64-bit integers) from a, selected with imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_extracti64x2_epi64

__m128i _mm512_extracti64x2_epi64(__m512i a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vextracti64x2

Extract 128 bits (composed of 2 packed 64-bit integers) from a, selected with imm, and store the result in the return value.



_mm512_mask_extracti64x2_epi64

__m128i _mm512_mask_extracti64x2_epi64(__m128i src, __mmask8 k, __m512i a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vextracti64x2

Extract 128 bits (composed of 2 packed 64-bit integers) from a, selected with imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_extracti64x2_epi64

__m128i _mm512_maskz_extracti64x2_epi64(__mmask8 k, __m512i a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vextracti64x2

Extract 128 bits (composed of 2 packed 64-bit integers) from a, selected with imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_alignr_epi8

__m128i _mm_mask_alignr_epi8(__m128i src, __mmask16 k, __m128i a, __m128i b, const int count)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpalignr

Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by count bytes, and store the low 16 bytes in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_alignr_epi8

__m128i _mm_maskz_alignr_epi8(__mmask16 k, __m128i a, __m128i b, const int count)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpalignr

Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by count bytes, and store the low 16 bytes in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_alignr_epi8

__m256i _mm256_mask_alignr_epi8(__m256i src, __mmask32 k, __m256i a, __m256i b, const int count)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpalignr

Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by count bytes, and store the low 16 bytes in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_alignr_epi8

__m256i _mm256_maskz_alignr_epi8(__mmask32 k, __m256i a, __m256i b, const int count)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpalignr

Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by count bytes, and store the low 16 bytes in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_alignr_epi8

__m512i _mm512_alignr_epi8(__m512i a, __m512i b, const int count)

CPUID Flags: AVX512BW

Instruction(s): vpalignr

Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by count bytes, and store the low 16 bytes in the return value.



_mm512_mask_alignr_epi8

__m512i _mm512_mask_alignr_epi8(__m512i src, __mmask64 k, __m512i a, __m512i b, const int count)

CPUID Flags: AVX512BW

Instruction(s): vpalignr

Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by count bytes, and store the low 16 bytes in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_alignr_epi8

__m512i _mm512_maskz_alignr_epi8(__mmask64 k, __m512i a, __m512i b, const int count)

CPUID Flags: AVX512BW

Instruction(s): vpalignr

Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by count bytes, and store the low 16 bytes in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_blend_epi8

__m128i _mm_mask_blend_epi8(__mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpblendmb

Blend packed 8-bit integers from a and b using control mask k, and return the results.



_mm256_mask_blend_epi8

__m256i _mm256_mask_blend_epi8(__mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpblendmb

Blend packed 8-bit integers from a and b using control mask k, and return the results.



_mm512_mask_blend_epi8

__m512i _mm512_mask_blend_epi8(__mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpblendmb

Blend packed 8-bit integers from a and b using control mask k, and return the results.



_mm_mask_blend_epi32

__m128i _mm_mask_blend_epi32(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpblendmd

Blend packed 32-bit integers from a and b using control mask k, and return the results.



_mm256_mask_blend_epi32

__m256i _mm256_mask_blend_epi32(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpblendmd

Blend packed 32-bit integers from a and b using control mask k, and return the results.



_mm_mask_blend_epi64

__m128i _mm_mask_blend_epi64(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpblendmq

Blend packed 64-bit integers from a and b using control mask k, and return the results.



_mm256_mask_blend_epi64

__m256i _mm256_mask_blend_epi64(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpblendmq

Blend packed 64-bit integers from a and b using control mask k, and return the results.



_mm_mask_blend_epi16

__m128i _mm_mask_blend_epi16(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpblendmw

Blend packed 16-bit integers from a and b using control mask k, and return the results.



_mm256_mask_blend_epi16

__m256i _mm256_mask_blend_epi16(__mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpblendmw

Blend packed 16-bit integers from a and b using control mask k, and return the results.



_mm512_mask_blend_epi16

__m512i _mm512_mask_blend_epi16(__mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpblendmw

Blend packed 16-bit integers from a and b using control mask k, and return the results.



_mm_mask_broadcastb_epi8

__m128i _mm_mask_broadcastb_epi8(__m128i src, __mmask16 k, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpbroadcastb

Broadcast the low packed 8-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_broadcastb_epi8

__m128i _mm_maskz_broadcastb_epi8(__mmask16 k, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpbroadcastb

Broadcast the low packed 8-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_broadcastb_epi8

__m256i _mm256_mask_broadcastb_epi8(__m256i src, __mmask32 k, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpbroadcastb

Broadcast the low packed 8-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_broadcastb_epi8

__m256i _mm256_maskz_broadcastb_epi8(__mmask32 k, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpbroadcastb

Broadcast the low packed 8-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_broadcastb_epi8

__m512i _mm512_broadcastb_epi8(__m128i a)

CPUID Flags: AVX512BW

Instruction(s): vpbroadcastb

Broadcast the low packed 8-bit integer from a to all elements of the return value.



_mm512_mask_broadcastb_epi8

__m512i _mm512_mask_broadcastb_epi8(__m512i src, __mmask64 k, __m128i a)

CPUID Flags: AVX512BW

Instruction(s): vpbroadcastb

Broadcast the low packed 8-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_broadcastb_epi8

__m512i _mm512_maskz_broadcastb_epi8(__mmask64 k, __m128i a)

CPUID Flags: AVX512BW

Instruction(s): vpbroadcastb

Broadcast the low packed 8-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_broadcastd_epi32

__m128i _mm_mask_broadcastd_epi32(__m128i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpbroadcastd

Broadcast the low packed 32-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_broadcastd_epi32

__m128i _mm_maskz_broadcastd_epi32(__mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpbroadcastd

Broadcast the low packed 32-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_broadcastd_epi32

__m256i _mm256_mask_broadcastd_epi32(__m256i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpbroadcastd

Broadcast the low packed 32-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_broadcastd_epi32

__m256i _mm256_maskz_broadcastd_epi32(__mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpbroadcastd

Broadcast the low packed 32-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_broadcastmb_epi64

__m128i _mm_broadcastmb_epi64(__mmask8 k)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vpbroadcastmb2q

Broadcast the low 8-bits from input mask k to all 64-bit elements of the return value.



_mm256_broadcastmb_epi64

__m256i _mm256_broadcastmb_epi64(__mmask8 k)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vpbroadcastmb2q

Broadcast the low 8-bits from input mask k to all 64-bit elements of the return value.



_mm_broadcastmw_epi32

__m128i _mm_broadcastmw_epi32(__mmask16 k)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vpbroadcastmw2d

Broadcast the low 16-bits from input mask k to all 32-bit elements of the return value.



_mm256_broadcastmw_epi32

__m256i _mm256_broadcastmw_epi32(__mmask16 k)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vpbroadcastmw2d

Broadcast the low 16-bits from input mask k to all 32-bit elements of the return value.



_mm_mask_broadcastq_epi64

__m128i _mm_mask_broadcastq_epi64(__m128i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpbroadcastq

Broadcast the low packed 64-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_broadcastq_epi64

__m128i _mm_maskz_broadcastq_epi64(__mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpbroadcastq

Broadcast the low packed 64-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_broadcastq_epi64

__m256i _mm256_mask_broadcastq_epi64(__m256i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpbroadcastq

Broadcast the low packed 64-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_broadcastq_epi64

__m256i _mm256_maskz_broadcastq_epi64(__mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpbroadcastq

Broadcast the low packed 64-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_broadcastw_epi16

__m128i _mm_mask_broadcastw_epi16(__m128i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpbroadcastw

Broadcast the low packed 16-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_broadcastw_epi16

__m128i _mm_maskz_broadcastw_epi16(__mmask8 k, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpbroadcastw

Broadcast the low packed 16-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_broadcastw_epi16

__m256i _mm256_mask_broadcastw_epi16(__m256i src, __mmask16 k, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpbroadcastw

Broadcast the low packed 16-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_broadcastw_epi16

__m256i _mm256_maskz_broadcastw_epi16(__mmask16 k, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpbroadcastw

Broadcast the low packed 16-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_broadcastw_epi16

__m512i _mm512_broadcastw_epi16(__m128i a)

CPUID Flags: AVX512BW

Instruction(s): vpbroadcastw

Broadcast the low packed 16-bit integer from a to all elements of the return value.



_mm512_mask_broadcastw_epi16

__m512i _mm512_mask_broadcastw_epi16(__m512i src, __mmask32 k, __m128i a)

CPUID Flags: AVX512BW

Instruction(s): vpbroadcastw

Broadcast the low packed 16-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_broadcastw_epi16

__m512i _mm512_maskz_broadcastw_epi16(__mmask32 k, __m128i a)

CPUID Flags: AVX512BW

Instruction(s): vpbroadcastw

Broadcast the low packed 16-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_compress_epi32

__m128i _mm_mask_compress_epi32(__m128i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpcompressd

Contiguously store the active 32-bit integers in a (those with their respective bit set in writemask k) to the return value, and pass through the remaining elements from src.



_mm_maskz_compress_epi32

__m128i _mm_maskz_compress_epi32(__mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpcompressd

Contiguously store the active 32-bit integers in a (those with their respective bit set in zeromask k) to the return value, and set the remaining elements to zero.



_mm256_mask_compress_epi32

__m256i _mm256_mask_compress_epi32(__m256i src, __mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpcompressd

Contiguously store the active 32-bit integers in a (those with their respective bit set in writemask k) to the return value, and pass through the remaining elements from src.



_mm256_maskz_compress_epi32

__m256i _mm256_maskz_compress_epi32(__mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpcompressd

Contiguously store the active 32-bit integers in a (those with their respective bit set in zeromask k) to the return value, and set the remaining elements to zero.



_mm_mask_compress_epi64

__m128i _mm_mask_compress_epi64(__m128i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpcompressq

Contiguously store the active 64-bit integers in a (those with their respective bit set in writemask k) to the return value, and pass through the remaining elements from src.



_mm_maskz_compress_epi64

__m128i _mm_maskz_compress_epi64(__mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpcompressq

Contiguously store the active 64-bit integers in a (those with their respective bit set in zeromask k) to the return value, and set the remaining elements to zero.



_mm256_mask_compress_epi64

__m256i _mm256_mask_compress_epi64(__m256i src, __mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpcompressq

Contiguously store the active 64-bit integers in a (those with their respective bit set in writemask k) to the return value, and pass through the remaining elements from src.



_mm256_maskz_compress_epi64

__m256i _mm256_maskz_compress_epi64(__mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpcompressq

Contiguously store the active 64-bit integers in a (those with their respective bit set in zeromask k) to the return value, and set the remaining elements to zero.



_mm256_mask_permutexvar_epi32

__m256i _mm256_mask_permutexvar_epi32(__m256i src, __mmask8 k, __m256i idx, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermd

Shuffle 32-bit integers in a across lanes using the corresponding index in idx, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_permutexvar_epi32

__m256i _mm256_maskz_permutexvar_epi32(__mmask8 k, __m256i idx, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermd

Shuffle 32-bit integers in a across lanes using the corresponding index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_permutexvar_epi32

__m256i _mm256_permutexvar_epi32(__m256i idx, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermd

Shuffle 32-bit integers in a across lanes using the corresponding index in idx, and return the results.



_mm_mask2_permutex2var_epi32

__m128i _mm_mask2_permutex2var_epi32(__m128i a, __m128i idx, __mmask8 k, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2d

Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).



_mm256_mask2_permutex2var_epi32

__m256i _mm256_mask2_permutex2var_epi32(__m256i a, __m256i idx, __mmask8 k, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2d

Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).



_mm_maskz_permutex2var_epi32

__m128i _mm_maskz_permutex2var_epi32(__mmask8 k, __m128i a, __m128i idx, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2d, vpermt2d

Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_permutex2var_epi32

__m128i _mm_permutex2var_epi32(__m128i a, __m128i idx, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2d, vpermt2d

Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results.



_mm256_maskz_permutex2var_epi32

__m256i _mm256_maskz_permutex2var_epi32(__mmask8 k, __m256i a, __m256i idx, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2d, vpermt2d

Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_permutex2var_epi32

__m256i _mm256_permutex2var_epi32(__m256i a, __m256i idx, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2d, vpermt2d

Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results.



_mm_mask2_permutex2var_epi64

__m128i _mm_mask2_permutex2var_epi64(__m128i a, __m128i idx, __mmask8 k, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2q

Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).



_mm256_mask2_permutex2var_epi64

__m256i _mm256_mask2_permutex2var_epi64(__m256i a, __m256i idx, __mmask8 k, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2q

Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).



_mm_maskz_permutex2var_epi64

__m128i _mm_maskz_permutex2var_epi64(__mmask8 k, __m128i a, __m128i idx, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2q, vpermt2q

Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_permutex2var_epi64

__m128i _mm_permutex2var_epi64(__m128i a, __m128i idx, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2q, vpermt2q

Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results.



_mm256_maskz_permutex2var_epi64

__m256i _mm256_maskz_permutex2var_epi64(__mmask8 k, __m256i a, __m256i idx, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2q, vpermt2q

Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_permutex2var_epi64

__m256i _mm256_permutex2var_epi64(__m256i a, __m256i idx, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2q, vpermt2q

Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results.



_mm_mask2_permutex2var_epi16

__m128i _mm_mask2_permutex2var_epi16(__m128i a, __m128i idx, __mmask8 k, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermi2w

Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).



_mm256_mask2_permutex2var_epi16

__m256i _mm256_mask2_permutex2var_epi16(__m256i a, __m256i idx, __mmask16 k, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermi2w

Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).



_mm512_mask2_permutex2var_epi16

__m512i _mm512_mask2_permutex2var_epi16(__m512i a, __m512i idx, __mmask32 k, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpermi2w

Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).



_mm_maskz_permutex2var_epi16

__m128i _mm_maskz_permutex2var_epi16(__mmask8 k, __m128i a, __m128i idx, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermi2w, vpermt2w

Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_permutex2var_epi16

__m128i _mm_permutex2var_epi16(__m128i a, __m128i idx, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermi2w, vpermt2w

Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results.



_mm256_maskz_permutex2var_epi16

__m256i _mm256_maskz_permutex2var_epi16(__mmask16 k, __m256i a, __m256i idx, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermi2w, vpermt2w

Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_permutex2var_epi16

__m256i _mm256_permutex2var_epi16(__m256i a, __m256i idx, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermi2w, vpermt2w

Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results.



_mm512_maskz_permutex2var_epi16

__m512i _mm512_maskz_permutex2var_epi16(__mmask32 k, __m512i a, __m512i idx, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpermi2w, vpermt2w

Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_permutex2var_epi16

__m512i _mm512_permutex2var_epi16(__m512i a, __m512i idx, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpermi2w, vpermt2w

Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results.



_mm256_mask_permutex_epi64

__m256i _mm256_mask_permutex_epi64(__m256i src, __mmask8 k, __m256i a, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermq

Shuffle 64-bit integers in a across lanes lanes using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_mask_permutexvar_epi64

__m256i _mm256_mask_permutexvar_epi64(__m256i src, __mmask8 k, __m256i idx, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermq

Shuffle 64-bit integers in a across lanes using the corresponding index in idx, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_permutex_epi64

__m256i _mm256_maskz_permutex_epi64(__mmask8 k, __m256i a, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermq

Shuffle 64-bit integers in a across lanes using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_maskz_permutexvar_epi64

__m256i _mm256_maskz_permutexvar_epi64(__mmask8 k, __m256i idx, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermq

Shuffle 64-bit integers in a across lanes using the corresponding index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_permutex_epi64

__m256i _mm256_permutex_epi64(__m256i a, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermq

Shuffle 64-bit integers in a across lanes using the control in imm, and return the results.



_mm256_permutexvar_epi64

__m256i _mm256_permutexvar_epi64(__m256i idx, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermq

Shuffle 64-bit integers in a across lanes using the corresponding index in idx, and return the results.



_mm_mask_permutex2var_epi32

__m128i _mm_mask_permutex2var_epi32(__m128i a, __mmask8 k, __m128i idx, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermt2d

Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_mask_permutex2var_epi32

__m256i _mm256_mask_permutex2var_epi32(__m256i a, __mmask8 k, __m256i idx, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermt2d

Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_mask_permutex2var_epi64

__m128i _mm_mask_permutex2var_epi64(__m128i a, __mmask8 k, __m128i idx, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermt2q

Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_mask_permutex2var_epi64

__m256i _mm256_mask_permutex2var_epi64(__m256i a, __mmask8 k, __m256i idx, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermt2q

Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_mask_permutex2var_epi16

__m128i _mm_mask_permutex2var_epi16(__m128i a, __mmask8 k, __m128i idx, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermt2w

Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_mask_permutex2var_epi16

__m256i _mm256_mask_permutex2var_epi16(__m256i a, __mmask16 k, __m256i idx, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermt2w

Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm512_mask_permutex2var_epi16

__m512i _mm512_mask_permutex2var_epi16(__m512i a, __mmask32 k, __m512i idx, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpermt2w

Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_mask_permutexvar_epi16

__m128i _mm_mask_permutexvar_epi16(__m128i src, __mmask8 k, __m128i idx, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermw

Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_permutexvar_epi16

__m128i _mm_maskz_permutexvar_epi16(__mmask8 k, __m128i idx, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermw

Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_permutexvar_epi16

__m128i _mm_permutexvar_epi16(__m128i idx, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermw

Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results.



_mm256_mask_permutexvar_epi16

__m256i _mm256_mask_permutexvar_epi16(__m256i src, __mmask16 k, __m256i idx, __m256i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermw

Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_permutexvar_epi16

__m256i _mm256_maskz_permutexvar_epi16(__mmask16 k, __m256i idx, __m256i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermw

Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_permutexvar_epi16

__m256i _mm256_permutexvar_epi16(__m256i idx, __m256i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermw

Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results.



_mm512_mask_permutexvar_epi16

__m512i _mm512_mask_permutexvar_epi16(__m512i src, __mmask32 k, __m512i idx, __m512i a)

CPUID Flags: AVX512BW

Instruction(s): vpermw

Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_permutexvar_epi16

__m512i _mm512_maskz_permutexvar_epi16(__mmask32 k, __m512i idx, __m512i a)

CPUID Flags: AVX512BW

Instruction(s): vpermw

Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_permutexvar_epi16

__m512i _mm512_permutexvar_epi16(__m512i idx, __m512i a)

CPUID Flags: AVX512BW

Instruction(s): vpermw

Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results.



_mm_mask_expand_epi32

__m128i _mm_mask_expand_epi32(__m128i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpexpandd

Load contiguous active 32-bit integers from a (those with their respective bit set in mask k), and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_expand_epi32

__m128i _mm_maskz_expand_epi32(__mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpexpandd

Load contiguous active 32-bit integers from a (those with their respective bit set in mask k), and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_expand_epi32

__m256i _mm256_mask_expand_epi32(__m256i src, __mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpexpandd

Load contiguous active 32-bit integers from a (those with their respective bit set in mask k), and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_expand_epi32

__m256i _mm256_maskz_expand_epi32(__mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpexpandd

Load contiguous active 32-bit integers from a (those with their respective bit set in mask k), and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_expand_epi64

__m128i _mm_mask_expand_epi64(__m128i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpexpandq

Load contiguous active 64-bit integers from a (those with their respective bit set in mask k), and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_expand_epi64

__m128i _mm_maskz_expand_epi64(__mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpexpandq

Load contiguous active 64-bit integers from a (those with their respective bit set in mask k), and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_expand_epi64

__m256i _mm256_mask_expand_epi64(__m256i src, __mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpexpandq

Load contiguous active 64-bit integers from a (those with their respective bit set in mask k), and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_expand_epi64

__m256i _mm256_maskz_expand_epi64(__mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpexpandq

Load contiguous active 64-bit integers from a (those with their respective bit set in mask k), and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_movm_epi8

__m128i _mm_movm_epi8(__mmask16 k)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmovm2b

Set each packed 8-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.



_mm256_movm_epi8

__m256i _mm256_movm_epi8(__mmask32 k)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmovm2b

Set each packed 8-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.



_mm512_movm_epi8

__m512i _mm512_movm_epi8(__mmask64 k)

CPUID Flags: AVX512BW

Instruction(s): vpmovm2b

Set each packed 8-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.



_mm_movm_epi32

__m128i _mm_movm_epi32(__mmask8 k)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vpmovm2d

Set each packed 32-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.



_mm256_movm_epi32

__m256i _mm256_movm_epi32(__mmask8 k)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vpmovm2d

Set each packed 32-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.



_mm512_movm_epi32

__m512i _mm512_movm_epi32(__mmask16 k)

CPUID Flags: AVX512DQ

Instruction(s): vpmovm2d

Set each packed 32-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.



_mm_movm_epi64

__m128i _mm_movm_epi64(__mmask8 k)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vpmovm2q

Set each packed 64-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.



_mm256_movm_epi64

__m256i _mm256_movm_epi64(__mmask8 k)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vpmovm2q

Set each packed 64-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.



_mm512_movm_epi64

__m512i _mm512_movm_epi64(__mmask8 k)

CPUID Flags: AVX512DQ

Instruction(s): vpmovm2q

Set each packed 64-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.



_mm_movm_epi16

__m128i _mm_movm_epi16(__mmask8 k)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmovm2w

Set each packed 16-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.



_mm256_movm_epi16

__m256i _mm256_movm_epi16(__mmask16 k)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmovm2w

Set each packed 16-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.



_mm512_movm_epi16

__m512i _mm512_movm_epi16(__mmask32 k)

CPUID Flags: AVX512BW

Instruction(s): vpmovm2w

Set each packed 16-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.



_mm512_sad_epu8

__m512i _mm512_sad_epu8(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpsadbw

Compute the absolute differences of packed unsigned 8-bit integers in a and b, then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in the return value.



_mm_mask_shuffle_epi8

__m128i _mm_mask_shuffle_epi8(__m128i src, __mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpshufb

Shuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_shuffle_epi8

__m128i _mm_maskz_shuffle_epi8(__mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpshufb

Shuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_shuffle_epi8

__m256i _mm256_mask_shuffle_epi8(__m256i src, __mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpshufb

Shuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_shuffle_epi8

__m256i _mm256_maskz_shuffle_epi8(__mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpshufb

Shuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_shuffle_epi8

__m512i _mm512_mask_shuffle_epi8(__m512i src, __mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpshufb

Shuffle 8-bit integers in a within 128-bit lanes using the control in the corresponding 8-bit element of b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_shuffle_epi8

__m512i _mm512_maskz_shuffle_epi8(__mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpshufb

Shuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_shuffle_epi8

__m512i _mm512_shuffle_epi8(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpshufb

Shuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and return the results.



_mm_mask_shuffle_epi32

__m128i _mm_mask_shuffle_epi32(__m128i src, __mmask8 k, __m128i a, _MM_PERM_ENUM imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpshufd

Shuffle 32-bit integers in a using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_shuffle_epi32

__m128i _mm_maskz_shuffle_epi32(__mmask8 k, __m128i a, _MM_PERM_ENUM imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpshufd

Shuffle 32-bit integers in a using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_shuffle_epi32

__m256i _mm256_mask_shuffle_epi32(__m256i src, __mmask8 k, __m256i a, _MM_PERM_ENUM imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpshufd

Shuffle 32-bit integers in a within 128-bit lanes using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_shuffle_epi32

__m256i _mm256_maskz_shuffle_epi32(__mmask8 k, __m256i a, _MM_PERM_ENUM imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpshufd

Shuffle 32-bit integers in a within 128-bit lanes using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_shufflehi_epi16

__m128i _mm_mask_shufflehi_epi16(__m128i src, __mmask8 k, __m128i a, int imm)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpshufhw

Shuffle 16-bit integers in the high 64 bits of a using the control in imm. Store the results in the high 64 bits of the return value, with the low 64 bits being copied from from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_shufflehi_epi16

__m128i _mm_maskz_shufflehi_epi16(__mmask8 k, __m128i a, int imm)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpshufhw

Shuffle 16-bit integers in the high 64 bits of a using the control in imm. Store the results in the high 64 bits of the return value, with the low 64 bits being copied from from a to dst, using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_shufflehi_epi16

__m256i _mm256_mask_shufflehi_epi16(__m256i src, __mmask16 k, __m256i a, int imm)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpshufhw

Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm. Store the results in the high 64 bits of 128-bit lanes of the return value, with the low 64 bits of 128-bit lanes being copied from from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_shufflehi_epi16

__m256i _mm256_maskz_shufflehi_epi16(__mmask16 k, __m256i a, int imm)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpshufhw

Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm. Store the results in the high 64 bits of 128-bit lanes of the return value, with the low 64 bits of 128-bit lanes being copied from from a to dst, using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_shufflehi_epi16

__m512i _mm512_mask_shufflehi_epi16(__m512i src, __mmask32 k, __m512i a, int imm)

CPUID Flags: AVX512BW

Instruction(s): vpshufhw

Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm. Store the results in the high 64 bits of 128-bit lanes of the return value, with the low 64 bits of 128-bit lanes being copied from from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_shufflehi_epi16

__m512i _mm512_maskz_shufflehi_epi16(__mmask32 k, __m512i a, int imm)

CPUID Flags: AVX512BW

Instruction(s): vpshufhw

Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm. Store the results in the high 64 bits of 128-bit lanes of the return value, with the low 64 bits of 128-bit lanes being copied from from a to dst, using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_shufflehi_epi16

__m512i _mm512_shufflehi_epi16(__m512i a, int imm)

CPUID Flags: AVX512BW

Instruction(s): vpshufhw

Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm. Store the results in the high 64 bits of 128-bit lanes of the return value, with the low 64 bits of 128-bit lanes being copied from from a to dst.



_mm_mask_shufflelo_epi16

__m128i _mm_mask_shufflelo_epi16(__m128i src, __mmask8 k, __m128i a, int imm)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpshuflw

Shuffle 16-bit integers in the low 64 bits of a using the control in imm. Store the results in the low 64 bits of the return value, with the high 64 bits being copied from from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_shufflelo_epi16

__m128i _mm_maskz_shufflelo_epi16(__mmask8 k, __m128i a, int imm)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpshuflw

Shuffle 16-bit integers in the low 64 bits of a using the control in imm. Store the results in the low 64 bits of the return value, with the high 64 bits being copied from from a to dst, using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_shufflelo_epi16

__m256i _mm256_mask_shufflelo_epi16(__m256i src, __mmask16 k, __m256i a, int imm)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpshuflw

Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm. Store the results in the low 64 bits of 128-bit lanes of the return value, with the high 64 bits of 128-bit lanes being copied from from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_shufflelo_epi16

__m256i _mm256_maskz_shufflelo_epi16(__mmask16 k, __m256i a, int imm)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpshuflw

Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm. Store the results in the low 64 bits of 128-bit lanes of the return value, with the high 64 bits of 128-bit lanes being copied from from a to dst, using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_shufflelo_epi16

__m512i _mm512_mask_shufflelo_epi16(__m512i src, __mmask32 k, __m512i a, int imm)

CPUID Flags: AVX512BW

Instruction(s): vpshuflw

Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm. Store the results in the low 64 bits of 128-bit lanes of the return value, with the high 64 bits of 128-bit lanes being copied from from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_shufflelo_epi16

__m512i _mm512_maskz_shufflelo_epi16(__mmask32 k, __m512i a, int imm)

CPUID Flags: AVX512BW

Instruction(s): vpshuflw

Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm. Store the results in the low 64 bits of 128-bit lanes of the return value, with the high 64 bits of 128-bit lanes being copied from from a to dst, using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_shufflelo_epi16

__m512i _mm512_shufflelo_epi16(__m512i a, int imm)

CPUID Flags: AVX512BW

Instruction(s): vpshuflw

Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm. Store the results in the low 64 bits of 128-bit lanes of the return value, with the high 64 bits of 128-bit lanes being copied from from a to dst.



_mm_mask_unpackhi_epi8

__m128i _mm_mask_unpackhi_epi8(__m128i src, __mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpunpckhbw

Unpack and interleave 8-bit integers from the high half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_unpackhi_epi8

__m128i _mm_maskz_unpackhi_epi8(__mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpunpckhbw

Unpack and interleave 8-bit integers from the high half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_unpackhi_epi8

__m256i _mm256_mask_unpackhi_epi8(__m256i src, __mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpunpckhbw

Unpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_unpackhi_epi8

__m256i _mm256_maskz_unpackhi_epi8(__mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpunpckhbw

Unpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_unpackhi_epi8

__m512i _mm512_mask_unpackhi_epi8(__m512i src, __mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpunpckhbw

Unpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_unpackhi_epi8

__m512i _mm512_maskz_unpackhi_epi8(__mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpunpckhbw

Unpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_unpackhi_epi8

__m512i _mm512_unpackhi_epi8(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpunpckhbw

Unpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and return the results.



_mm_mask_unpackhi_epi32

__m128i _mm_mask_unpackhi_epi32(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpunpckhdq

Unpack and interleave 32-bit integers from the high half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_unpackhi_epi32

__m128i _mm_maskz_unpackhi_epi32(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpunpckhdq

Unpack and interleave 32-bit integers from the high half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_unpackhi_epi32

__m256i _mm256_mask_unpackhi_epi32(__m256i src, __mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpunpckhdq

Unpack and interleave 32-bit integers from the high half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_unpackhi_epi32

__m256i _mm256_maskz_unpackhi_epi32(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpunpckhdq

Unpack and interleave 32-bit integers from the high half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_unpackhi_epi64

__m128i _mm_mask_unpackhi_epi64(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpunpckhqdq

Unpack and interleave 64-bit integers from the high half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_unpackhi_epi64

__m128i _mm_maskz_unpackhi_epi64(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpunpckhqdq

Unpack and interleave 64-bit integers from the high half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_unpackhi_epi64

__m256i _mm256_mask_unpackhi_epi64(__m256i src, __mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpunpckhqdq

Unpack and interleave 64-bit integers from the high half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_unpackhi_epi64

__m256i _mm256_maskz_unpackhi_epi64(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpunpckhqdq

Unpack and interleave 64-bit integers from the high half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_unpackhi_epi16

__m128i _mm_mask_unpackhi_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpunpckhwd

Unpack and interleave 16-bit integers from the high half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_unpackhi_epi16

__m128i _mm_maskz_unpackhi_epi16(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpunpckhwd

Unpack and interleave 16-bit integers from the high half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_unpackhi_epi16

__m256i _mm256_mask_unpackhi_epi16(__m256i src, __mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpunpckhwd

Unpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_unpackhi_epi16

__m256i _mm256_maskz_unpackhi_epi16(__mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpunpckhwd

Unpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_unpackhi_epi16

__m512i _mm512_mask_unpackhi_epi16(__m512i src, __mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpunpckhwd

Unpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_unpackhi_epi16

__m512i _mm512_maskz_unpackhi_epi16(__mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpunpckhwd

Unpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_unpackhi_epi16

__m512i _mm512_unpackhi_epi16(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpunpckhwd

Unpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and return the results.



_mm_mask_unpacklo_epi8

__m128i _mm_mask_unpacklo_epi8(__m128i src, __mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpunpcklbw

Unpack and interleave 8-bit integers from the low half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_unpacklo_epi8

__m128i _mm_maskz_unpacklo_epi8(__mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpunpcklbw

Unpack and interleave 8-bit integers from the low half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_unpacklo_epi8

__m256i _mm256_mask_unpacklo_epi8(__m256i src, __mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpunpcklbw

Unpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_unpacklo_epi8

__m256i _mm256_maskz_unpacklo_epi8(__mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpunpcklbw

Unpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_unpacklo_epi8

__m512i _mm512_mask_unpacklo_epi8(__m512i src, __mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpunpcklbw

Unpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_unpacklo_epi8

__m512i _mm512_maskz_unpacklo_epi8(__mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpunpcklbw

Unpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_unpacklo_epi8

__m512i _mm512_unpacklo_epi8(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpunpcklbw

Unpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and return the results.



_mm_mask_unpacklo_epi32

__m128i _mm_mask_unpacklo_epi32(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpunpckldq

Unpack and interleave 32-bit integers from the low half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_unpacklo_epi32

__m128i _mm_maskz_unpacklo_epi32(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpunpckldq

Unpack and interleave 32-bit integers from the low half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_unpacklo_epi32

__m256i _mm256_mask_unpacklo_epi32(__m256i src, __mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpunpckldq

Unpack and interleave 32-bit integers from the low half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_unpacklo_epi32

__m256i _mm256_maskz_unpacklo_epi32(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpunpckldq

Unpack and interleave 32-bit integers from the low half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_unpacklo_epi64

__m128i _mm_mask_unpacklo_epi64(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpunpcklqdq

Unpack and interleave 64-bit integers from the low half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_unpacklo_epi64

__m128i _mm_maskz_unpacklo_epi64(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpunpcklqdq

Unpack and interleave 64-bit integers from the low half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_unpacklo_epi64

__m256i _mm256_mask_unpacklo_epi64(__m256i src, __mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpunpcklqdq

Unpack and interleave 64-bit integers from the low half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_unpacklo_epi64

__m256i _mm256_maskz_unpacklo_epi64(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpunpcklqdq

Unpack and interleave 64-bit integers from the low half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_unpacklo_epi16

__m128i _mm_mask_unpacklo_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpunpcklwd

Unpack and interleave 16-bit integers from the low half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_unpacklo_epi16

__m128i _mm_maskz_unpacklo_epi16(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpunpcklwd

Unpack and interleave 16-bit integers from the low half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_unpacklo_epi16

__m256i _mm256_mask_unpacklo_epi16(__m256i src, __mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpunpcklwd

Unpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_unpacklo_epi16

__m256i _mm256_maskz_unpacklo_epi16(__mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpunpcklwd

Unpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_unpacklo_epi16

__m512i _mm512_mask_unpacklo_epi16(__m512i src, __mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpunpcklwd

Unpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_unpacklo_epi16

__m512i _mm512_maskz_unpacklo_epi16(__mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpunpcklwd

Unpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_unpacklo_epi16

__m512i _mm512_unpacklo_epi16(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpunpcklwd

Unpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and return the results.



_mm512_kunpackd

__mmask64 _mm512_kunpackd(__mmask64 a, __mmask64 b)

CPUID Flags: AVX512BW

Instruction(s): kunpckdq

Unpack and interleave 32 bits from masks a and b, and return the 64-bit result.



_mm512_kunpackw

__mmask32 _mm512_kunpackw(__mmask32 a, __mmask32 b)

CPUID Flags: AVX512BW

Instruction(s): kunpckwd

Unpack and interleave 16 bits from masks a and b, and store the 32-bit result in k.



_mm_fpclass_pd_mask

__mmask8 _mm_fpclass_pd_mask(__m128d a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vfpclasspd

Test packed double-precision (64-bit) floating-point elements in a for special categories specified by imm, and and put each result in the corresponding bit of the returned mask value.



_mm_mask_fpclass_pd_mask

__mmask8 _mm_mask_fpclass_pd_mask(__mmask8 k1, __m128d a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vfpclasspd

Test packed double-precision (64-bit) floating-point elements in a for special categories specified by imm, and and put each result in the corresponding bit of the returned mask value using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).



_mm256_fpclass_pd_mask

__mmask8 _mm256_fpclass_pd_mask(__m256d a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vfpclasspd

Test packed double-precision (64-bit) floating-point elements in a for special categories specified by imm, and and put each result in the corresponding bit of the returned mask value.



_mm256_mask_fpclass_pd_mask

__mmask8 _mm256_mask_fpclass_pd_mask(__mmask8 k1, __m256d a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vfpclasspd

Test packed double-precision (64-bit) floating-point elements in a for special categories specified by imm, and and put each result in the corresponding bit of the returned mask value using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).



_mm512_fpclass_pd_mask

__mmask8 _mm512_fpclass_pd_mask(__m512d a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vfpclasspd

Test packed double-precision (64-bit) floating-point elements in a for special categories specified by imm, and and put each result in the corresponding bit of the returned mask value.



_mm512_mask_fpclass_pd_mask

__mmask8 _mm512_mask_fpclass_pd_mask(__mmask8 k1, __m512d a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vfpclasspd

Test packed double-precision (64-bit) floating-point elements in a for special categories specified by imm, and and put each result in the corresponding bit of the returned mask value using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).



_mm_fpclass_ps_mask

__mmask8 _mm_fpclass_ps_mask(__m128 a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vfpclassps

Test packed single-precision (32-bit) floating-point elements in a for special categories specified by imm, and and put each result in the corresponding bit of the returned mask value.



_mm_mask_fpclass_ps_mask

__mmask8 _mm_mask_fpclass_ps_mask(__mmask8 k1, __m128 a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vfpclassps

Test packed single-precision (32-bit) floating-point elements in a for special categories specified by imm, and and put each result in the corresponding bit of the returned mask value using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).



_mm256_fpclass_ps_mask

__mmask8 _mm256_fpclass_ps_mask(__m256 a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vfpclassps

Test packed single-precision (32-bit) floating-point elements in a for special categories specified by imm, and and put each result in the corresponding bit of the returned mask value.



_mm256_mask_fpclass_ps_mask

__mmask8 _mm256_mask_fpclass_ps_mask(__mmask8 k1, __m256 a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vfpclassps

Test packed single-precision (32-bit) floating-point elements in a for special categories specified by imm, and and put each result in the corresponding bit of the returned mask value using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).



_mm512_fpclass_ps_mask

__mmask16 _mm512_fpclass_ps_mask(__m512 a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vfpclassps

Test packed single-precision (32-bit) floating-point elements in a for special categories specified by imm, and and put each result in the corresponding bit of the returned mask value.



_mm512_mask_fpclass_ps_mask

__mmask16 _mm512_mask_fpclass_ps_mask(__mmask16 k1, __m512 a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vfpclassps

Test packed single-precision (32-bit) floating-point elements in a for special categories specified by imm, and and put each result in the corresponding bit of the returned mask value using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).



_mm_fpclass_sd_mask

__mmask8 _mm_fpclass_sd_mask(__m128d a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vfpclasssd

Test the lower double-precision (64-bit) floating-point element in a for special categories specified by imm, and and put the result in the returned mask value.



_mm_mask_fpclass_sd_mask

__mmask8 _mm_mask_fpclass_sd_mask(__mmask8 k1, __m128d a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vfpclasssd

Test the lower double-precision (64-bit) floating-point element in a for special categories specified by imm, and and put the result in the returned mask value using zeromask k1 (the element is zeroed out when mask bit 0 is not set).



_mm_fpclass_ss_mask

__mmask8 _mm_fpclass_ss_mask(__m128 a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vfpclassss

Test the lower single-precision (32-bit) floating-point element in a for special categories specified by imm, and store the result in mask vector "k.



_mm_mask_fpclass_ss_mask

__mmask8 _mm_mask_fpclass_ss_mask(__mmask8 k1, __m128 a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vfpclassss

Test the lower single-precision (32-bit) floating-point element in a for special categories specified by imm, and and put the result in the returned mask value using zeromask k1 (the element is zeroed out when mask bit 0 is not set).



_mm_movepi8_mask

__mmask16 _mm_movepi8_mask(__m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmovb2m

Set each bit of the returned mask value based on the most significant bit of the corresponding packed 8-bit integer in a.



_mm256_movepi8_mask

__mmask32 _mm256_movepi8_mask(__m256i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmovb2m

Set each bit of the returned mask value based on the most significant bit of the corresponding packed 8-bit integer in a.



_mm512_movepi8_mask

__mmask64 _mm512_movepi8_mask(__m512i a)

CPUID Flags: AVX512BW

Instruction(s): vpmovb2m

Set each bit of the returned mask value based on the most significant bit of the corresponding packed 8-bit integer in a.



_mm_movepi32_mask

__mmask8 _mm_movepi32_mask(__m128i a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vpmovd2m

Set each bit of the returned mask value based on the most significant bit of the corresponding packed 32-bit integer in a.



_mm256_movepi32_mask

__mmask8 _mm256_movepi32_mask(__m256i a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vpmovd2m

Set each bit of the returned mask value based on the most significant bit of the corresponding packed 32-bit integer in a.



_mm512_movepi32_mask

__mmask16 _mm512_movepi32_mask(__m512i a)

CPUID Flags: AVX512DQ

Instruction(s): vpmovd2m

Set each bit of the returned mask value based on the most significant bit of the corresponding packed 32-bit integer in a.



_mm_movepi64_mask

__mmask8 _mm_movepi64_mask(__m128i a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vpmovq2m

Set each bit of the returned mask value based on the most significant bit of the corresponding packed 64-bit integer in a.



_mm256_movepi64_mask

__mmask8 _mm256_movepi64_mask(__m256i a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vpmovq2m

Set each bit of the returned mask value based on the most significant bit of the corresponding packed 64-bit integer in a.



_mm512_movepi64_mask

__mmask8 _mm512_movepi64_mask(__m512i a)

CPUID Flags: AVX512DQ

Instruction(s): vpmovq2m

Set each bit of the returned mask value based on the most significant bit of the corresponding packed 64-bit integer in a.



_mm_movepi16_mask

__mmask8 _mm_movepi16_mask(__m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmovw2m

Set each bit of the returned mask value based on the most significant bit of the corresponding packed 16-bit integer in a.



_mm256_movepi16_mask

__mmask16 _mm256_movepi16_mask(__m256i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmovw2m

Set each bit of the returned mask value based on the most significant bit of the corresponding packed 16-bit integer in a.



_mm512_movepi16_mask

__mmask32 _mm512_movepi16_mask(__m512i a)

CPUID Flags: AVX512BW

Instruction(s): vpmovw2m

Set each bit of the returned mask value based on the most significant bit of the corresponding packed 16-bit integer in a.



_mm_permutexvar_epi8

__m128i _mm_permutexvar_epi8(__m128i idx, __m128i a) 

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpermb

Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and return the result.



_mm_mask_permutexvar_epi8

__m128i _mm_mask_permutexvar_epi8(__m128i src, __mmask16 k, __m128i idx, __m128i a) 

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpermb

Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and return the result using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_permutexvar_epi8

__m128i _mm_maskz_permutexvar_epi8(__mmask16 k, __m128i idx, __m128i a) 

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpermb

Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and return the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_permutexvar_epi8

__m256i _mm256_permutexvar_epi8(__m256i idx, __m256i a) 

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpermb

Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and return the result.



_mm256_mask_permutexvar_epi8

__m256i _mm256_mask_permutexvar_epi8(__m256i src, __mmask32 k, __m256i idx, __m256i a) 

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpermb

Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and return the result using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_permutexvar_epi8

__m256i _mm256_maskz_permutexvar_epi8(__mmask32 k, __m256i idx, __m256i a) 

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpermb

Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and return the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_permutexvar_epi8

__m512i _mm512_permutexvar_epi8(__m512i idx, __m512i a) 

CPUID Flags: AVX512VBMI

Instruction(s): vpermb

Shuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the result.



_mm512_mask_permutexvar_epi8

__m512i _mm512_mask_permutexvar_epi8(__m512i src, __mmask64 k, __m512i idx, __m512i a) 

CPUID Flags: AVX512VBMI

Instruction(s): vpermb

Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and return the result using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_permutexvar_epi8

__m512i _mm512_maskz_permutexvar_epi8(__mmask64 k, __m512i idx, __m512i a) 

CPUID Flags: AVX512VBMI

Instruction(s): vpermb

Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and return the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_permutex2var_epi8

__m128i _mm_permutex2var_epi8(__m128i a, __m128i idx, __m128i b) 

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpermi2b

Shuffle 8-bit integers in a and b using the corresponding index in idx, and return the result.



_mm_mask_permutex2var_epi8

__m128i _mm_mask_permutex2var_epi8(__m128i a, __mmask16 k, __m128i idx, __m128i b) 

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpermt2b

Shuffle 8-bit integers in a and b using the corresponding index in idx, and return the result using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_mask2_permutex2var_epi8

__m128i _mm_mask2_permutex2var_epi8(__m128i a, __m128i idx, __mmask16 k, __m128i b) 

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpermi2b

Shuffle 8-bit integers in a and b using the corresponding index in idx, and return the result using writemask k (elements are copied from idx when the corresponding mask bit is not set).



_mm_maskz_permutex2var_epi8

__m128i _mm_maskz_permutex2var_epi8(__mmask16 k, __m128i a, __m128i idx, __m128i b) 

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpermi2b, vpermt2b

Shuffle 8-bit integers in a and b using the corresponding index in idx, and return the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_permutex2var_epi8

__m256i _mm256_permutex2var_epi8(__m256i a, __m256i idx, __m256i b) 

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpermi2b

Shuffle 8-bit integers in a and b across lanes using the corresponding index in idx, and return the result.



_mm256_mask_permutex2var_epi8

__m256i _mm256_mask_permutex2var_epi8(__m256i a, __mmask32 k, __m256i idx, __m256i b) 

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpermt2b

Shuffle 8-bit integers in a and b across lanes using the corresponding index in idx, and return the result using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_mask2_permutex2var_epi8

__m256i _mm256_mask2_permutex2var_epi8(__m256i a, __m256i idx, __mmask32 k, __m256i b) 

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpermi2b

Shuffle 8-bit integers in a and b across lanes using the corresponding index in idx, and return the result using writemask k (elements are copied from idx when the corresponding mask bit is not set).



_mm256_maskz_permutex2var_epi8

__m256i _mm256_maskz_permutex2var_epi8(__mmask32 k, __m256i a, __m256i idx, __m256i b) 

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpermi2b, vpermt2b

Shuffle 8-bit integers in a and b across lanes using the corresponding index in idx, and return the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_permutex2var_epi8

__m512i _mm512_permutex2var_epi8(__m512i a, __m512i idx, __m512i b) 

CPUID Flags: AVX512VBMI

Instruction(s): vpermi2b

Shuffle 8-bit integers in a and b across lanes using the corresponding index in idx, and return the result.



_mm512_mask_permutex2var_epi8

__m512i _mm512_mask_permutex2var_epi8(__m512i a, __mmask64 k, __m512i idx, __m512i b) 

CPUID Flags: AVX512VBMI

Instruction(s): vpermt2b

Shuffle 8-bit integers in a and b across lanes using the corresponding index in idx, and return the result using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm512_mask2_permutex2var_epi8

__m512i _mm512_mask2_permutex2var_epi8(__m512i a, __m512i idx, __mmask64 k, __m512i b) 

CPUID Flags: AVX512VBMI

Instruction(s): vpermi2b

Shuffle 8-bit integers in a and b across lanes using the corresponding index in idx, and return the result using writemask k (elements are copied from idx when the corresponding mask bit is not set).



_mm512_maskz_permutex2var_epi8

__m512i _mm512_maskz_permutex2var_epi8(__mmask64 k, __m512i a, __m512i idx, __m512i b) 

CPUID Flags: AVX512VBMI

Instruction(s): vpermi2b, vpermt2b

Shuffle 8-bit integers in a and b across lanes using the corresponding index in idx, and return the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).