Visible to Intel only — GUID: GUID-CA907764-3819-4F25-A35F-64AA20A4C85E
Visible to Intel only — GUID: GUID-CA907764-3819-4F25-A35F-64AA20A4C85E
Intrinsics for Miscellaneous Operations
The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file.
To use these intrinsics, include the immintrin.h file as follows:
#include <immintrin.h>
variable | definition |
---|---|
src | source element to use based on writemask result |
k | writemask used as a selector |
a | first source vector element |
b | second source vector element |
c | third source vector element |
rounding | Rounding control values; these can be one of the following (along with the sae suppress all exceptions flag):
|
interv | Where _MM_MANTISSA_NORM_ENUM can be one of the following:
|
sc | Where _MM_MANTISSA_SIGN_ENUM can be one of the following:
|
_mm_broadcast_i32x2
__m128i _mm_broadcast_i32x2(__m128i a)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vbroadcasti32x2
Broadcast the lower 2 packed 32-bit integers from a to all elements of "dst.
_mm_mask_broadcast_i32x2
__m128i _mm_mask_broadcast_i32x2(__m128i src, __mmask8 k, __m128i a)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vbroadcasti32x2
Broadcast the lower 2 packed 32-bit integers from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_broadcast_i32x2
__m128i _mm_maskz_broadcast_i32x2(__mmask8 k, __m128i a)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vbroadcasti32x2
Broadcast the lower 2 packed 32-bit integers from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_broadcast_i32x2
__m256i _mm256_broadcast_i32x2(__m128i a)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vbroadcasti32x2
Broadcast the lower 2 packed 32-bit integers from a to all elements of "dst.
_mm256_mask_broadcast_i32x2
__m256i _mm256_mask_broadcast_i32x2(__m256i src, __mmask8 k, __m128i a)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vbroadcasti32x2
Broadcast the lower 2 packed 32-bit integers from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_broadcast_i32x2
__m256i _mm256_maskz_broadcast_i32x2(__mmask8 k, __m128i a)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vbroadcasti32x2
Broadcast the lower 2 packed 32-bit integers from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_broadcast_i32x2
__m512i _mm512_broadcast_i32x2(__m128i a)
CPUID Flags: AVX512DQ
Instruction(s): vbroadcasti32x2
Broadcast the lower 2 packed 32-bit integers from a to all elements of "dst.
_mm512_mask_broadcast_i32x2
__m512i _mm512_mask_broadcast_i32x2(__m512i src, __mmask16 k, __m128i a)
CPUID Flags: AVX512DQ
Instruction(s): vbroadcasti32x2
Broadcast the lower 2 packed 32-bit integers from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_broadcast_i32x2
__m512i _mm512_maskz_broadcast_i32x2(__mmask16 k, __m128i a)
CPUID Flags: AVX512DQ
Instruction(s): vbroadcasti32x2
Broadcast the lower 2 packed 32-bit integers from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_broadcast_i32x4
__m256i _mm256_broadcast_i32x4(__m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vbroadcasti32x4
Broadcast the 4 packed 32-bit integers from a to all elements of the return value.
_mm256_mask_broadcast_i32x4
__m256i _mm256_mask_broadcast_i32x4(__m256i src, __mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vbroadcasti32x4
Broadcast the 4 packed 32-bit integers from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_broadcast_i32x4
__m256i _mm256_maskz_broadcast_i32x4(__mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vbroadcasti32x4
Broadcast the 4 packed 32-bit integers from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_broadcast_i32x8
__m512i _mm512_broadcast_i32x8(__m256i a)
CPUID Flags: AVX512DQ
Instruction(s): vbroadcasti32x8
Broadcast the 8 packed 32-bit integers from a to all elements of the return value.
_mm512_mask_broadcast_i32x8
__m512i _mm512_mask_broadcast_i32x8(__m512i src, __mmask16 k, __m256i a)
CPUID Flags: AVX512DQ
Instruction(s): vbroadcasti32x8
Broadcast the 8 packed 32-bit integers from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_broadcast_i32x8
__m512i _mm512_maskz_broadcast_i32x8(__mmask16 k, __m256i a)
CPUID Flags: AVX512DQ
Instruction(s): vbroadcasti32x8
Broadcast the 8 packed 32-bit integers from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_broadcast_i64x2
__m256i _mm256_broadcast_i64x2(__m128i a)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vbroadcasti64x2
Broadcast the 2 packed 64-bit integers from a to all elements of the return value.
_mm256_mask_broadcast_i64x2
__m256i _mm256_mask_broadcast_i64x2(__m256i src, __mmask8 k, __m128i a)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vbroadcasti64x2
Broadcast the 2 packed 64-bit integers from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_broadcast_i64x2
__m256i _mm256_maskz_broadcast_i64x2(__mmask8 k, __m128i a)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vbroadcasti64x2
Broadcast the 2 packed 64-bit integers from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_broadcast_i64x2
__m512i _mm512_broadcast_i64x2(__m128i a)
CPUID Flags: AVX512DQ
Instruction(s): vbroadcasti64x2
Broadcast the 2 packed 64-bit integers from a to all elements of the return value.
_mm512_mask_broadcast_i64x2
__m512i _mm512_mask_broadcast_i64x2(__m512i src, __mmask8 k, __m128i a)
CPUID Flags: AVX512DQ
Instruction(s): vbroadcasti64x2
Broadcast the 2 packed 64-bit integers from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_broadcast_i64x2
__m512i _mm512_maskz_broadcast_i64x2(__mmask8 k, __m128i a)
CPUID Flags: AVX512DQ
Instruction(s): vbroadcasti64x2
Broadcast the 2 packed 64-bit integers from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_inserti32x4
__m256i _mm256_inserti32x4(__m256i a, __m128i b, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vinserti32x4
Copy a to the return value, then insert 128 bits (composed of 4 packed 32-bit integers) from b into dst at the location specified by imm.
_mm256_mask_inserti32x4
__m256i _mm256_mask_inserti32x4(__m256i src, __mmask8 k, __m256i a, __m128i b, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vinserti32x4
Copy a to tmp, then insert 128 bits (composed of 4 packed 32-bit integers) from b into tmp at the location specified by imm. Store tmp to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_inserti32x4
__m256i _mm256_maskz_inserti32x4(__mmask8 k, __m256i a, __m128i b, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vinserti32x4
Copy a to tmp, then insert 128 bits (composed of 4 packed 32-bit integers) from b into tmp at the location specified by imm. Store tmp to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_inserti32x8
__m512i _mm512_inserti32x8(__m512i a, __m256i b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vinserti32x8
Copy a to the return value, then insert 256 bits (composed of 8 packed 32-bit integers) from b into dst at the location specified by imm.
_mm512_mask_inserti32x8
__m512i _mm512_mask_inserti32x8(__m512i src, __mmask16 k, __m512i a, __m256i b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vinserti32x8
Copy a to tmp, then insert 256 bits (composed of 8 packed 32-bit integers) from b into tmp at the location specified by imm. Store tmp to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_inserti32x8
__m512i _mm512_maskz_inserti32x8(__mmask16 k, __m512i a, __m256i b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vinserti32x8
Copy a to tmp, then insert 256 bits (composed of 8 packed 32-bit integers) from b into tmp at the location specified by imm. Store tmp to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_inserti64x2
__m256i _mm256_inserti64x2(__m256i a, __m128i b, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vinserti64x2
Copy a to the return value, then insert 128 bits (composed of 2 packed 64-bit integers) from b into dst at the location specified by imm.
_mm256_mask_inserti64x2
__m256i _mm256_mask_inserti64x2(__m256i src, __mmask8 k, __m256i a, __m128i b, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vinserti64x2
Copy a to tmp, then insert 128 bits (composed of 2 packed 64-bit integers) from b into tmp at the location specified by imm. Store tmp to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_inserti64x2
__m256i _mm256_maskz_inserti64x2(__mmask8 k, __m256i a, __m128i b, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vinserti64x2
Copy a to tmp, then insert 128 bits (composed of 2 packed 64-bit integers) from b into tmp at the location specified by imm. Store tmp to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_inserti64x2
__m512i _mm512_inserti64x2(__m512i a, __m128i b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vinserti64x2
Copy a to the return value, then insert 128 bits (composed of 2 packed 64-bit integers) from b into dst at the location specified by imm.
_mm512_mask_inserti64x2
__m512i _mm512_mask_inserti64x2(__m512i src, __mmask8 k, __m512i a, __m128i b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vinserti64x2
Copy a to tmp, then insert 128 bits (composed of 2 packed 64-bit integers) from b into tmp at the location specified by imm. Store tmp to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_inserti64x2
__m512i _mm512_maskz_inserti64x2(__mmask8 k, __m512i a, __m128i b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vinserti64x2
Copy a to tmp, then insert 128 bits (composed of 2 packed 64-bit integers) from b into tmp at the location specified by imm. Store tmp to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_shuffle_i32x4
__m256i _mm256_mask_shuffle_i32x4(__m256i src, __mmask8 k, __m256i a, __m256i b, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vshufi32x4
Shuffle 128-bits (composed of 4 32-bit integers) selected by imm from a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_shuffle_i32x4
__m256i _mm256_maskz_shuffle_i32x4(__mmask8 k, __m256i a, __m256i b, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vshufi32x4
Shuffle 128-bits (composed of 4 32-bit integers) selected by imm from a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_shuffle_i32x4
__m256i _mm256_shuffle_i32x4(__m256i a, __m256i b, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vshufi32x4
Shuffle 128-bits (composed of 4 32-bit integers) selected by imm from a and b, and return the results.
_mm256_mask_shuffle_i64x2
__m256i _mm256_mask_shuffle_i64x2(__m256i src, __mmask8 k, __m256i a, __m256i b, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vshufi64x2
Shuffle 128-bits (composed of 2 64-bit integers) selected by imm from a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_shuffle_i64x2
__m256i _mm256_maskz_shuffle_i64x2(__mmask8 k, __m256i a, __m256i b, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vshufi64x2
Shuffle 128-bits (composed of 2 64-bit integers) selected by imm from a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_shuffle_i64x2
__m256i _mm256_shuffle_i64x2(__m256i a, __m256i b, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vshufi64x2
Shuffle 128-bits (composed of 2 64-bit integers) selected by imm from a and b, and return the results.
_mm_mask_blend_pd
__m128d _mm_mask_blend_pd(__mmask8 k, __m128d a, __m128d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vblendmpd
Blend packed double-precision (64-bit) floating-point elements from a and b using control mask k, and return the results.
_mm256_mask_blend_pd
__m256d _mm256_mask_blend_pd(__mmask8 k, __m256d a, __m256d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vblendmpd
Blend packed double-precision (64-bit) floating-point elements from a and b using control mask k, and return the results.
_mm_mask_blend_ps
__m128 _mm_mask_blend_ps(__mmask8 k, __m128 a, __m128 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vblendmps
Blend packed single-precision (32-bit) floating-point elements from a and b using control mask k, and return the results.
_mm256_mask_blend_ps
__m256 _mm256_mask_blend_ps(__mmask8 k, __m256 a, __m256 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vblendmps
Blend packed single-precision (32-bit) floating-point elements from a and b using control mask k, and return the results.
_mm256_broadcast_f32x2
__m256 _mm256_broadcast_f32x2(__m128 a)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vbroadcastf32x2
Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of the return value.
_mm256_mask_broadcast_f32x2
__m256 _mm256_mask_broadcast_f32x2(__m256 src, __mmask8 k, __m128 a)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vbroadcastf32x2
Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_broadcast_f32x2
__m256 _mm256_maskz_broadcast_f32x2(__mmask8 k, __m128 a)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vbroadcastf32x2
Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_broadcast_f32x2
__m512 _mm512_broadcast_f32x2(__m128 a)
CPUID Flags: AVX512DQ
Instruction(s): vbroadcastf32x2
Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of the return value.
_mm512_mask_broadcast_f32x2
__m512 _mm512_mask_broadcast_f32x2(__m512 src, __mmask16 k, __m128 a)
CPUID Flags: AVX512DQ
Instruction(s): vbroadcastf32x2
Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_broadcast_f32x2
__m512 _mm512_maskz_broadcast_f32x2(__mmask16 k, __m128 a)
CPUID Flags: AVX512DQ
Instruction(s): vbroadcastf32x2
Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_broadcast_f32x4
__m256 _mm256_broadcast_f32x4(__m128 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vbroadcastf32x4
Broadcast the 4 packed single-precision (32-bit) floating-point elements from a to all elements of the return value.
_mm256_mask_broadcast_f32x4
__m256 _mm256_mask_broadcast_f32x4(__m256 src, __mmask8 k, __m128 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vbroadcastf32x4
Broadcast the 4 packed single-precision (32-bit) floating-point elements from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_broadcast_f32x4
__m256 _mm256_maskz_broadcast_f32x4(__mmask8 k, __m128 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vbroadcastf32x4
Broadcast the 4 packed single-precision (32-bit) floating-point elements from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_broadcast_f32x8
__m512 _mm512_broadcast_f32x8(__m256 a)
CPUID Flags: AVX512DQ
Instruction(s): vbroadcastf32x8
Broadcast the 8 packed single-precision (32-bit) floating-point elements from a to all elements of the return value.
_mm512_mask_broadcast_f32x8
__m512 _mm512_mask_broadcast_f32x8(__m512 src, __mmask16 k, __m256 a)
CPUID Flags: AVX512DQ
Instruction(s): vbroadcastf32x8
Broadcast the 8 packed single-precision (32-bit) floating-point elements from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_broadcast_f32x8
__m512 _mm512_maskz_broadcast_f32x8(__mmask16 k, __m256 a)
CPUID Flags: AVX512DQ
Instruction(s): vbroadcastf32x8
Broadcast the 8 packed single-precision (32-bit) floating-point elements from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_broadcast_f64x2
__m256d _mm256_broadcast_f64x2(__m128d a)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vbroadcastf64x2
Broadcast the 2 packed double-precision (64-bit) floating-point elements from a to all elements of the return value.
_mm256_mask_broadcast_f64x2
__m256d _mm256_mask_broadcast_f64x2(__m256d src, __mmask8 k, __m128d a)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vbroadcastf64x2
Broadcast the 2 packed double-precision (64-bit) floating-point elements from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_broadcast_f64x2
__m256d _mm256_maskz_broadcast_f64x2(__mmask8 k, __m128d a)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vbroadcastf64x2
Broadcast the 2 packed double-precision (64-bit) floating-point elements from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_broadcast_f64x2
__m512d _mm512_broadcast_f64x2(__m128d a)
CPUID Flags: AVX512DQ
Instruction(s): vbroadcastf64x2
Broadcast the 2 packed double-precision (64-bit) floating-point elements from a to all elements of the return value.
_mm512_mask_broadcast_f64x2
__m512d _mm512_mask_broadcast_f64x2(__m512d src, __mmask8 k, __m128d a)
CPUID Flags: AVX512DQ
Instruction(s): vbroadcastf64x2
Broadcast the 2 packed double-precision (64-bit) floating-point elements from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_broadcast_f64x2
__m512d _mm512_maskz_broadcast_f64x2(__mmask8 k, __m128d a)
CPUID Flags: AVX512DQ
Instruction(s): vbroadcastf64x2
Broadcast the 2 packed double-precision (64-bit) floating-point elements from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_broadcastsd_pd
__m256d _mm256_mask_broadcastsd_pd(__m256d src, __mmask8 k, __m128d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vbroadcastsd
Broadcast the low double-precision (64-bit) floating-point element from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_broadcastsd_pd
__m256d _mm256_maskz_broadcastsd_pd(__mmask8 k, __m128d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vbroadcastsd
Broadcast the low double-precision (64-bit) floating-point element from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_broadcastss_ps
__m128 _mm_mask_broadcastss_ps(__m128 src, __mmask8 k, __m128 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vbroadcastss
Broadcast the low single-precision (32-bit) floating-point element from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_broadcastss_ps
__m128 _mm_maskz_broadcastss_ps(__mmask8 k, __m128 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vbroadcastss
Broadcast the low single-precision (32-bit) floating-point element from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_broadcastss_ps
__m256 _mm256_mask_broadcastss_ps(__m256 src, __mmask8 k, __m128 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vbroadcastss
Broadcast the low single-precision (32-bit) floating-point element from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_broadcastss_ps
__m256 _mm256_maskz_broadcastss_ps(__mmask8 k, __m128 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vbroadcastss
Broadcast the low single-precision (32-bit) floating-point element from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_compress_pd
__m128d _mm_mask_compress_pd(__m128d src, __mmask8 k, __m128d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vcompresspd
Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to the return value, and pass through the remaining elements from src.
_mm_maskz_compress_pd
__m128d _mm_maskz_compress_pd(__mmask8 k, __m128d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vcompresspd
Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in zeromask k) to the return value, and set the remaining elements to zero.
_mm256_mask_compress_pd
__m256d _mm256_mask_compress_pd(__m256d src, __mmask8 k, __m256d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vcompresspd
Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to the return value, and pass through the remaining elements from src.
_mm256_maskz_compress_pd
__m256d _mm256_maskz_compress_pd(__mmask8 k, __m256d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vcompresspd
Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in zeromask k) to the return value, and set the remaining elements to zero.
_mm_mask_compress_ps
__m128 _mm_mask_compress_ps(__m128 src, __mmask8 k, __m128 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vcompressps
Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in writemask k) to the return value, and pass through the remaining elements from src.
_mm_maskz_compress_ps
__m128 _mm_maskz_compress_ps(__mmask8 k, __m128 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vcompressps
Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in zeromask k) to the return value, and set the remaining elements to zero.
_mm256_mask_compress_ps
__m256 _mm256_mask_compress_ps(__m256 src, __mmask8 k, __m256 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vcompressps
Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in writemask k) to the return value, and pass through the remaining elements from src.
_mm256_maskz_compress_ps
__m256 _mm256_maskz_compress_ps(__mmask8 k, __m256 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vcompressps
Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in zeromask k) to the return value, and set the remaining elements to zero.
_mm_mask_expand_pd
__m128d _mm_mask_expand_pd(__m128d src, __mmask8 k, __m128d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vexpandpd
Load contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_expand_pd
__m128d _mm_maskz_expand_pd(__mmask8 k, __m128d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vexpandpd
Load contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_expand_pd
__m256d _mm256_mask_expand_pd(__m256d src, __mmask8 k, __m256d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vexpandpd
Load contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_expand_pd
__m256d _mm256_maskz_expand_pd(__mmask8 k, __m256d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vexpandpd
Load contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_expand_ps
__m128 _mm_mask_expand_ps(__m128 src, __mmask8 k, __m128 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vexpandps
Load contiguous active single-precision (32-bit) floating-point elements from a (those with their respective bit set in mask k), and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_expand_ps
__m128 _mm_maskz_expand_ps(__mmask8 k, __m128 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vexpandps
Load contiguous active single-precision (32-bit) floating-point elements from a (those with their respective bit set in mask k), and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_expand_ps
__m256 _mm256_mask_expand_ps(__m256 src, __mmask8 k, __m256 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vexpandps
Load contiguous active single-precision (32-bit) floating-point elements from a (those with their respective bit set in mask k), and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_expand_ps
__m256 _mm256_maskz_expand_ps(__mmask8 k, __m256 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vexpandps
Load contiguous active single-precision (32-bit) floating-point elements from a (those with their respective bit set in mask k), and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_extractf32x4_ps
__m128 _mm256_extractf32x4_ps(__m256 a, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vextractf32x4
Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm, and store the result in the return value.
_mm256_mask_extractf32x4_ps
__m128 _mm256_mask_extractf32x4_ps(__m128 src, __mmask8 k, __m256 a, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vextractf32x4
Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_extractf32x4_ps
__m128 _mm256_maskz_extractf32x4_ps(__mmask8 k, __m256 a, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vextractf32x4
Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_extractf32x8_ps
__m256 _mm512_extractf32x8_ps(__m512 a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vextractf32x8
Extract 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a, selected with imm, and store the result in the return value.
_mm512_mask_extractf32x8_ps
__m256 _mm512_mask_extractf32x8_ps(__m256 src, __mmask8 k, __m512 a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vextractf32x8
Extract 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a, selected with imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_extractf32x8_ps
__m256 _mm512_maskz_extractf32x8_ps(__mmask8 k, __m512 a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vextractf32x8
Extract 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a, selected with imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_extractf64x2_pd
__m128d _mm256_extractf64x2_pd(__m256d a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vextractf64x2
Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with imm, and store the result in the return value.
_mm256_mask_extractf64x2_pd
__m128d _mm256_mask_extractf64x2_pd(__m128d src, __mmask8 k, __m256d a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vextractf64x2
Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_extractf64x2_pd
__m128d _mm256_maskz_extractf64x2_pd(__mmask8 k, __m256d a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vextractf64x2
Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_extractf64x2_pd
__m128d _mm512_extractf64x2_pd(__m512d a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vextractf64x2
Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with imm, and store the result in the return value.
_mm512_mask_extractf64x2_pd
__m128d _mm512_mask_extractf64x2_pd(__m128d src, __mmask8 k, __m512d a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vextractf64x2
Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_extractf64x2_pd
__m128d _mm512_maskz_extractf64x2_pd(__mmask8 k, __m512d a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vextractf64x2
Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_fixupimm_pd
__m128d _mm_fixupimm_pd(__m128d a, __m128d b, __m128i c, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfixupimmpd
Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and return the results. imm is used to set the required flags reporting.
_mm_mask_fixupimm_pd
__m128d _mm_mask_fixupimm_pd(__m128d a, __mmask8 k, __m128d b, __m128i c, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfixupimmpd
Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set). imm is used to set the required flags reporting.
_mm_maskz_fixupimm_pd
__m128d _mm_maskz_fixupimm_pd(__mmask8 k, __m128d a, __m128d b, __m128i c, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfixupimmpd
Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm is used to set the required flags reporting.
_mm256_fixupimm_pd
__m256d _mm256_fixupimm_pd(__m256d a, __m256d b, __m256i c, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfixupimmpd
Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and return the results. imm is used to set the required flags reporting.
_mm256_mask_fixupimm_pd
__m256d _mm256_mask_fixupimm_pd(__m256d a, __mmask8 k, __m256d b, __m256i c, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfixupimmpd
Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set). imm is used to set the required flags reporting.
_mm256_maskz_fixupimm_pd
__m256d _mm256_maskz_fixupimm_pd(__mmask8 k, __m256d a, __m256d b, __m256i c, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfixupimmpd
Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm is used to set the required flags reporting.
_mm_fixupimm_ps
__m128 _mm_fixupimm_ps(__m128 a, __m128 b, __m128i c, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfixupimmps
Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and return the results. imm is used to set the required flags reporting.
_mm_mask_fixupimm_ps
__m128 _mm_mask_fixupimm_ps(__m128 a, __mmask8 k, __m128 b, __m128i c, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfixupimmps
Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set). imm is used to set the required flags reporting.
_mm_maskz_fixupimm_ps
__m128 _mm_maskz_fixupimm_ps(__mmask8 k, __m128 a, __m128 b, __m128i c, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfixupimmps
Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm is used to set the required flags reporting.
_mm256_fixupimm_ps
__m256 _mm256_fixupimm_ps(__m256 a, __m256 b, __m256i c, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfixupimmps
Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and return the results. imm is used to set the required flags reporting.
_mm256_mask_fixupimm_ps
__m256 _mm256_mask_fixupimm_ps(__m256 a, __mmask8 k, __m256 b, __m256i c, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfixupimmps
Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set). imm is used to set the required flags reporting.
_mm256_maskz_fixupimm_ps
__m256 _mm256_maskz_fixupimm_ps(__mmask8 k, __m256 a, __m256 b, __m256i c, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfixupimmps
Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm is used to set the required flags reporting.
_mm_getexp_pd
__m128d _mm_getexp_pd(__m128d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgetexppd
Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and return the results. This intrinsic essentially calculates floor(log2(x)) for each element.
_mm_mask_getexp_pd
__m128d _mm_mask_getexp_pd(__m128d src, __mmask8 k, __m128d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgetexppd
Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm_maskz_getexp_pd
__m128d _mm_maskz_getexp_pd(__mmask8 k, __m128d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgetexppd
Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm256_getexp_pd
__m256d _mm256_getexp_pd(__m256d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgetexppd
Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and return the results. This intrinsic essentially calculates floor(log2(x)) for each element.
_mm256_mask_getexp_pd
__m256d _mm256_mask_getexp_pd(__m256d src, __mmask8 k, __m256d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgetexppd
Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm256_maskz_getexp_pd
__m256d _mm256_maskz_getexp_pd(__mmask8 k, __m256d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgetexppd
Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm_getexp_ps
__m128 _mm_getexp_ps(__m128 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgetexpps
Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and return the results. This intrinsic essentially calculates floor(log2(x)) for each element.
_mm_mask_getexp_ps
__m128 _mm_mask_getexp_ps(__m128 src, __mmask8 k, __m128 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgetexpps
Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm_maskz_getexp_ps
__m128 _mm_maskz_getexp_ps(__mmask8 k, __m128 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgetexpps
Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm256_getexp_ps
__m256 _mm256_getexp_ps(__m256 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgetexpps
Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and return the results. This intrinsic essentially calculates floor(log2(x)) for each element.
_mm256_mask_getexp_ps
__m256 _mm256_mask_getexp_ps(__m256 src, __mmask8 k, __m256 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgetexpps
Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm256_maskz_getexp_ps
__m256 _mm256_maskz_getexp_ps(__mmask8 k, __m256 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgetexpps
Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
_mm_getmant_pd
__m128d _mm_getmant_pd(__m128d a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgetmantpd
Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and return the results. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
_mm_mask_getmant_pd
__m128d _mm_mask_getmant_pd(__m128d src, __mmask8 k, __m128d a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgetmantpd
Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
_mm_maskz_getmant_pd
__m128d _mm_maskz_getmant_pd(__mmask8 k, __m128d a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgetmantpd
Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
_mm256_getmant_pd
__m256d _mm256_getmant_pd(__m256d a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgetmantpd
Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and return the results. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
_mm256_mask_getmant_pd
__m256d _mm256_mask_getmant_pd(__m256d src, __mmask8 k, __m256d a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgetmantpd
Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
_mm256_maskz_getmant_pd
__m256d _mm256_maskz_getmant_pd(__mmask8 k, __m256d a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgetmantpd
Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
_mm_getmant_ps
__m128 _mm_getmant_ps(__m128 a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgetmantps
Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and return the results. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
_mm_mask_getmant_ps
__m128 _mm_mask_getmant_ps(__m128 src, __mmask8 k, __m128 a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgetmantps
Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
_mm_maskz_getmant_ps
__m128 _mm_maskz_getmant_ps(__mmask8 k, __m128 a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgetmantps
Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
_mm256_getmant_ps
__m256 _mm256_getmant_ps(__m256 a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgetmantps
Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and return the results. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
_mm256_mask_getmant_ps
__m256 _mm256_mask_getmant_ps(__m256 src, __mmask8 k, __m256 a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgetmantps
Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
_mm256_maskz_getmant_ps
__m256 _mm256_maskz_getmant_ps(__mmask8 k, __m256 a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgetmantps
Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
_mm256_insertf32x4
__m256 _mm256_insertf32x4(__m256 a, __m128 b, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vinsertf32x4
Copy a to the return value, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into dst at the location specified by imm.
_mm256_mask_insertf32x4
__m256 _mm256_mask_insertf32x4(__m256 src, __mmask8 k, __m256 a, __m128 b, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vinsertf32x4
Copy a to tmp, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into tmp at the location specified by imm. Store tmp to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_insertf32x4
__m256 _mm256_maskz_insertf32x4(__mmask8 k, __m256 a, __m128 b, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vinsertf32x4
Copy a to tmp, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into tmp at the location specified by imm. Store tmp to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_insertf32x8
__m512 _mm512_insertf32x8(__m512 a, __m256 b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vinsertf32x8
Copy a to the return value, then insert 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from b into dst at the location specified by imm.
_mm512_mask_insertf32x8
__m512 _mm512_mask_insertf32x8(__m512 src, __mmask16 k, __m512 a, __m256 b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vinsertf32x8
Copy a to tmp, then insert 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from b into tmp at the location specified by imm. Store tmp to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_insertf32x8
__m512 _mm512_maskz_insertf32x8(__mmask16 k, __m512 a, __m256 b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vinsertf32x8
Copy a to tmp, then insert 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from b into tmp at the location specified by imm. Store tmp to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_insertf64x2
__m256d _mm256_insertf64x2(__m256d a, __m128d b, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vinsertf64x2
Copy a to the return value, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into dst at the location specified by imm.
_mm256_mask_insertf64x2
__m256d _mm256_mask_insertf64x2(__m256d src, __mmask8 k, __m256d a, __m128d b, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vinsertf64x2
Copy a to tmp, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into tmp at the location specified by imm. Store tmp to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_insertf64x2
__m256d _mm256_maskz_insertf64x2(__mmask8 k, __m256d a, __m128d b, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vinsertf64x2
Copy a to tmp, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into tmp at the location specified by imm. Store tmp to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_insertf64x2
__m512d _mm512_insertf64x2(__m512d a, __m128d b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vinsertf64x2
Copy a to the return value, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into dst at the location specified by imm.
_mm512_mask_insertf64x2
__m512d _mm512_mask_insertf64x2(__m512d src, __mmask8 k, __m512d a, __m128d b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vinsertf64x2
Copy a to tmp, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into tmp at the location specified by imm. Store tmp to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_insertf64x2
__m512d _mm512_maskz_insertf64x2(__mmask8 k, __m512d a, __m128d b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vinsertf64x2
Copy a to tmp, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into tmp at the location specified by imm. Store tmp to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask2_permutex2var_pd
__m128d _mm_mask2_permutex2var_pd(__m128d a, __m128i idx, __mmask8 k, __m128d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermi2pd
Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set)
_mm256_mask2_permutex2var_pd
__m256d _mm256_mask2_permutex2var_pd(__m256d a, __m256i idx, __mmask8 k, __m256d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermi2pd
Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).
_mm_maskz_permutex2var_pd
__m128d _mm_maskz_permutex2var_pd(__mmask8 k, __m128d a, __m128i idx, __m128d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermi2pd, vpermt2pd
Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_permutex2var_pd
__m128d _mm_permutex2var_pd(__m128d a, __m128i idx, __m128d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermi2pd, vpermt2pd
Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results.
_mm256_maskz_permutex2var_pd
__m256d _mm256_maskz_permutex2var_pd(__mmask8 k, __m256d a, __m256i idx, __m256d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermi2pd, vpermt2pd
Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_permutex2var_pd
__m256d _mm256_permutex2var_pd(__m256d a, __m256i idx, __m256d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermi2pd, vpermt2pd
Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results.
_mm_mask2_permutex2var_ps
__m128 _mm_mask2_permutex2var_ps(__m128 a, __m128i idx, __mmask8 k, __m128 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermi2ps
Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).
_mm256_mask2_permutex2var_ps
__m256 _mm256_mask2_permutex2var_ps(__m256 a, __m256i idx, __mmask8 k, __m256 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermi2ps
Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).
_mm_maskz_permutex2var_ps
__m128 _mm_maskz_permutex2var_ps(__mmask8 k, __m128 a, __m128i idx, __m128 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermi2ps, vpermt2ps
Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_permutex2var_ps
__m128 _mm_permutex2var_ps(__m128 a, __m128i idx, __m128 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermi2ps, vpermt2ps
Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results.
_mm256_maskz_permutex2var_ps
__m256 _mm256_maskz_permutex2var_ps(__mmask8 k, __m256 a, __m256i idx, __m256 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermi2ps, vpermt2ps
Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_permutex2var_ps
__m256 _mm256_permutex2var_ps(__m256 a, __m256i idx, __m256 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermi2ps, vpermt2ps
Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results.
_mm_mask_permute_pd
__m128d _mm_mask_permute_pd(__m128d src, __mmask8 k, __m128d a, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermilpd
Shuffle double-precision (64-bit) floating-point elements in a using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_permutevar_pd
__m128d _mm_mask_permutevar_pd(__m128d src, __mmask8 k, __m128d a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermilpd
Shuffle double-precision (64-bit) floating-point elements in a using the control in b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_permute_pd
__m128d _mm_maskz_permute_pd(__mmask8 k, __m128d a, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermilpd
Shuffle double-precision (64-bit) floating-point elements in a using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_permutevar_pd
__m128d _mm_maskz_permutevar_pd(__mmask8 k, __m128d a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermilpd
Shuffle double-precision (64-bit) floating-point elements in a using the control in b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_permute_pd
__m256d _mm256_mask_permute_pd(__m256d src, __mmask8 k, __m256d a, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermilpd
Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_permutevar_pd
__m256d _mm256_mask_permutevar_pd(__m256d src, __mmask8 k, __m256d a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermilpd
Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_permute_pd
__m256d _mm256_maskz_permute_pd(__mmask8 k, __m256d a, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermilpd
Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_permutevar_pd
__m256d _mm256_maskz_permutevar_pd(__mmask8 k, __m256d a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermilpd
Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_permute_ps
__m128 _mm_mask_permute_ps(__m128 src, __mmask8 k, __m128 a, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermilps
Shuffle single-precision (32-bit) floating-point elements in a using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_mask_permutevar_ps
__m128 _mm_mask_permutevar_ps(__m128 src, __mmask8 k, __m128 a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermilps
Shuffle single-precision (32-bit) floating-point elements in a using the control in b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_permute_ps
__m128 _mm_maskz_permute_ps(__mmask8 k, __m128 a, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermilps
Shuffle single-precision (32-bit) floating-point elements in a using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_maskz_permutevar_ps
__m128 _mm_maskz_permutevar_ps(__mmask8 k, __m128 a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermilps
Shuffle single-precision (32-bit) floating-point elements in a using the control in b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_permute_ps
__m256 _mm256_mask_permute_ps(__m256 src, __mmask8 k, __m256 a, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermilps
Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_permutevar_ps
__m256 _mm256_mask_permutevar_ps(__m256 src, __mmask8 k, __m256 a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermilps
Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_permute_ps
__m256 _mm256_maskz_permute_ps(__mmask8 k, __m256 a, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermilps
Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_permutevar_ps
__m256 _mm256_maskz_permutevar_ps(__mmask8 k, __m256 a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermilps
Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_permutex_pd
__m256d _mm256_mask_permutex_pd(__m256d src, __mmask8 k, __m256d a, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermpd
Shuffle double-precision (64-bit) floating-point elements in a across lanes using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_permutexvar_pd
__m256d _mm256_mask_permutexvar_pd(__m256d src, __mmask8 k, __m256i idx, __m256d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermpd
Shuffle double-precision (64-bit) floating-point elements in a across lanes using the corresponding index in idx, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_permutex_pd
__m256d _mm256_maskz_permutex_pd(__mmask8 k, __m256d a, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermpd
Shuffle double-precision (64-bit) floating-point elements in a across lanes using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_permutexvar_pd
__m256d _mm256_maskz_permutexvar_pd(__mmask8 k, __m256i idx, __m256d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermpd
Shuffle double-precision (64-bit) floating-point elements in a across lanes using the corresponding index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_permutex_pd
__m256d _mm256_permutex_pd(__m256d a, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermpd
Shuffle double-precision (64-bit) floating-point elements in a across lanes using the control in imm, and return the results.
_mm256_permutexvar_pd
__m256d _mm256_permutexvar_pd(__m256i idx, __m256d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermpd
Shuffle double-precision (64-bit) floating-point elements in a across lanes using the corresponding index in idx, and return the results.
_mm256_mask_permutexvar_ps
__m256 _mm256_mask_permutexvar_ps(__m256 src, __mmask8 k, __m256i idx, __m256 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermps
Shuffle single-precision (32-bit) floating-point elements in a across lanes using the corresponding index in idx, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_permutexvar_ps
__m256 _mm256_maskz_permutexvar_ps(__mmask8 k, __m256i idx, __m256 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermps
Shuffle single-precision (32-bit) floating-point elements in a across lanes using the corresponding index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_permutexvar_ps
__m256 _mm256_permutexvar_ps(__m256i idx, __m256 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermps
Shuffle single-precision (32-bit) floating-point elements in a across lanes using the corresponding index in idx.
_mm_mask_permutex2var_pd
__m128d _mm_mask_permutex2var_pd(__m128d a, __mmask8 k, __m128i idx, __m128d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermt2pd
Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm256_mask_permutex2var_pd
__m256d _mm256_mask_permutex2var_pd(__m256d a, __mmask8 k, __m256i idx, __m256d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermt2pd
Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm_mask_permutex2var_ps
__m128 _mm_mask_permutex2var_ps(__m128 a, __mmask8 k, __m128i idx, __m128 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermt2ps
Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm256_mask_permutex2var_ps
__m256 _mm256_mask_permutex2var_ps(__m256 a, __mmask8 k, __m256i idx, __m256 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermt2ps
Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm_mask_range_pd
__m128d _mm_mask_range_pd(__m128d src, __mmask8 k, __m128d a, __m128d b, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vrangepd
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_range_pd
__m128d _mm_maskz_range_pd(__mmask8 k, __m128d a, __m128d b, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vrangepd
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_range_pd
__m128d _mm_range_pd(__m128d a, __m128d b, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vrangepd
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results.
_mm256_mask_range_pd
__m256d _mm256_mask_range_pd(__m256d src, __mmask8 k, __m256d a, __m256d b, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vrangepd
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_range_pd
__m256d _mm256_maskz_range_pd(__mmask8 k, __m256d a, __m256d b, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vrangepd
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_range_pd
__m256d _mm256_range_pd(__m256d a, __m256d b, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vrangepd
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results.
_mm512_mask_range_pd
__m512d _mm512_mask_range_pd(__m512d src, __mmask8 k, __m512d a, __m512d b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vrangepd
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_range_round_pd
__m512d _mm512_mask_range_round_pd(__m512d src, __mmask8 k, __m512d a, __m512d b, int imm, int rounding)
CPUID Flags: AVX512DQ
Instruction(s): vrangepd
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_range_pd
__m512d _mm512_maskz_range_pd(__mmask8 k, __m512d a, __m512d b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vrangepd
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_range_round_pd
__m512d _mm512_maskz_range_round_pd(__mmask8 k, __m512d a, __m512d b, int imm, int rounding)
CPUID Flags: AVX512DQ
Instruction(s): vrangepd
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_range_pd
__m512d _mm512_range_pd(__m512d a, __m512d b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vrangepd
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results.
_mm512_range_round_pd
__m512d _mm512_range_round_pd(__m512d a, __m512d b, int imm, int rounding)
CPUID Flags: AVX512DQ
Instruction(s): vrangepd
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results.
_mm_mask_range_ps
__m128 _mm_mask_range_ps(__m128 src, __mmask8 k, __m128 a, __m128 b, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vrangeps
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_range_ps
__m128 _mm_maskz_range_ps(__mmask8 k, __m128 a, __m128 b, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vrangeps
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_range_ps
__m128 _mm_range_ps(__m128 a, __m128 b, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vrangeps
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results.
_mm256_mask_range_ps
__m256 _mm256_mask_range_ps(__m256 src, __mmask8 k, __m256 a, __m256 b, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vrangeps
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_range_ps
__m256 _mm256_maskz_range_ps(__mmask8 k, __m256 a, __m256 b, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vrangeps
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_range_ps
__m256 _mm256_range_ps(__m256 a, __m256 b, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vrangeps
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results.
_mm512_mask_range_ps
__m512 _mm512_mask_range_ps(__m512 src, __mmask16 k, __m512 a, __m512 b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vrangeps
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_range_round_ps
__m512 _mm512_mask_range_round_ps(__m512 src, __mmask16 k, __m512 a, __m512 b, int imm, int rounding)
CPUID Flags: AVX512DQ
Instruction(s): vrangeps
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_range_ps
__m512 _mm512_maskz_range_ps(__mmask16 k, __m512 a, __m512 b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vrangeps
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_range_round_ps
__m512 _mm512_maskz_range_round_ps(__mmask16 k, __m512 a, __m512 b, int imm, int rounding)
CPUID Flags: AVX512DQ
Instruction(s): vrangeps
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_range_ps
__m512 _mm512_range_ps(__m512 a, __m512 b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vrangeps
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results.
_mm512_range_round_ps
__m512 _mm512_range_round_ps(__m512 a, __m512 b, int imm, int rounding)
CPUID Flags: AVX512DQ
Instruction(s): vrangeps
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results.
_mm_mask_range_round_sd
__m128d _mm_mask_range_round_sd(__m128d src, __mmask8 k, __m128d a, __m128d b, int imm, int rounding)
CPUID Flags: AVX512DQ
Instruction(s): vrangesd
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of the return value using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
_mm_mask_range_sd
__m128d _mm_mask_range_sd(__m128d src, __mmask8 k, __m128d a, __m128d b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vrangesd
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of the return value using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
_mm_maskz_range_round_sd
__m128d _mm_maskz_range_round_sd(__mmask8 k, __m128d a, __m128d b, int imm, int rounding)
CPUID Flags: AVX512DQ
Instruction(s): vrangesd
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of the return value using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
_mm_maskz_range_sd
__m128d _mm_maskz_range_sd(__mmask8 k, __m128d a, __m128d b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vrangesd
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of the return value using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
_mm_range_round_sd
__m128d _mm_range_round_sd(__m128d a, __m128d b, int imm, int rounding)
CPUID Flags: AVX512DQ
Instruction(s): vrangesd
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of the return value, and copy the upper element from a to the upper element of dst.
_mm_mask_range_round_ss
__m128 _mm_mask_range_round_ss(__m128 src, __mmask8 k, __m128 a, __m128 b, int imm, int rounding)
CPUID Flags: AVX512DQ
Instruction(s): vrangess
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of the return value using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
_mm_mask_range_ss
__m128 _mm_mask_range_ss(__m128 src, __mmask8 k, __m128 a, __m128 b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vrangess
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of the return value using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
_mm_maskz_range_round_ss
__m128 _mm_maskz_range_round_ss(__mmask8 k, __m128 a, __m128 b, int imm, int rounding)
CPUID Flags: AVX512DQ
Instruction(s): vrangess
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of the return value using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
_mm_maskz_range_ss
__m128 _mm_maskz_range_ss(__mmask8 k, __m128 a, __m128 b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vrangess
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of the return value using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
_mm_range_round_ss
__m128 _mm_range_round_ss(__m128 a, __m128 b, int imm, int rounding)
CPUID Flags: AVX512DQ
Instruction(s): vrangess
Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of the return value, and copy the upper 3 packed elements from a to the upper elements of dst.
_mm_mask_reduce_pd
__m128d _mm_mask_reduce_pd(__m128d src, __mmask8 k, __m128d a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vreducepd
Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_reduce_pd
__m128d _mm_maskz_reduce_pd(__mmask8 k, __m128d a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vreducepd
Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_reduce_pd
__m128d _mm_reduce_pd(__m128d a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vreducepd
Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results.
_mm256_mask_reduce_pd
__m256d _mm256_mask_reduce_pd(__m256d src, __mmask8 k, __m256d a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vreducepd
Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_reduce_pd
__m256d _mm256_maskz_reduce_pd(__mmask8 k, __m256d a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vreducepd
Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_reduce_pd
__m256d _mm256_reduce_pd(__m256d a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vreducepd
Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results.
_mm512_mask_reduce_pd
__m512d _mm512_mask_reduce_pd(__m512d src, __mmask8 k, __m512d a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vreducepd
Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_reduce_round_pd
__m512d _mm512_mask_reduce_round_pd(__m512d src, __mmask8 k, __m512d a, int imm, int rounding)
CPUID Flags: AVX512DQ
Instruction(s): vreducepd
Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_reduce_pd
__m512d _mm512_maskz_reduce_pd(__mmask8 k, __m512d a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vreducepd
Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_reduce_round_pd
__m512d _mm512_maskz_reduce_round_pd(__mmask8 k, __m512d a, int imm, int rounding)
CPUID Flags: AVX512DQ
Instruction(s): vreducepd
Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_reduce_pd
__m512d _mm512_reduce_pd(__m512d a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vreducepd
Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results.
_mm512_reduce_round_pd
__m512d _mm512_reduce_round_pd(__m512d a, int imm, int rounding)
CPUID Flags: AVX512DQ
Instruction(s): vreducepd
Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results.
_mm_mask_reduce_ps
__m128 _mm_mask_reduce_ps(__m128 src, __mmask8 k, __m128 a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vreduceps
Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_reduce_ps
__m128 _mm_maskz_reduce_ps(__mmask8 k, __m128 a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vreduceps
Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_reduce_ps
__m128 _mm_reduce_ps(__m128 a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vreduceps
Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results.
_mm256_mask_reduce_ps
__m256 _mm256_mask_reduce_ps(__m256 src, __mmask8 k, __m256 a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vreduceps
Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_reduce_ps
__m256 _mm256_maskz_reduce_ps(__mmask8 k, __m256 a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vreduceps
Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_reduce_ps
__m256 _mm256_reduce_ps(__m256 a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vreduceps
Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results.
_mm512_mask_reduce_ps
__m512 _mm512_mask_reduce_ps(__m512 src, __mmask16 k, __m512 a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vreduceps
Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_mask_reduce_round_ps
__m512 _mm512_mask_reduce_round_ps(__m512 src, __mmask16 k, __m512 a, int imm, int rounding)
CPUID Flags: AVX512DQ
Instruction(s): vreduceps
Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_reduce_ps
__m512 _mm512_maskz_reduce_ps(__mmask16 k, __m512 a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vreduceps
Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maskz_reduce_round_ps
__m512 _mm512_maskz_reduce_round_ps(__mmask16 k, __m512 a, int imm, int rounding)
CPUID Flags: AVX512DQ
Instruction(s): vreduceps
Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_reduce_ps
__m512 _mm512_reduce_ps(__m512 a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vreduceps
Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results.
_mm512_reduce_round_ps
__m512 _mm512_reduce_round_ps(__m512 a, int imm, int rounding)
CPUID Flags: AVX512DQ
Instruction(s): vreduceps
Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results.
_mm_mask_reduce_round_sd
__m128d _mm_mask_reduce_round_sd(__m128d src, __mmask8 k, __m128d a, __m128d b, int imm, int rounding)
CPUID Flags: AVX512DQ
Instruction(s): vreducesd
Extract the reduced argument of the lower double-precision (64-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from b to the upper element of dst.
_mm_mask_reduce_sd
__m128d _mm_mask_reduce_sd(__m128d src, __mmask8 k, __m128d a, __m128d b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vreducesd
Extract the reduced argument of the lower double-precision (64-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from b to the upper element of dst.
_mm_maskz_reduce_round_sd
__m128d _mm_maskz_reduce_round_sd(__mmask8 k, __m128d a, __m128d b, int imm, int rounding)
CPUID Flags: AVX512DQ
Instruction(s): vreducesd
Extract the reduced argument of the lower double-precision (64-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from b to the upper element of dst.
_mm_maskz_reduce_sd
__m128d _mm_maskz_reduce_sd(__mmask8 k, __m128d a, __m128d b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vreducesd
Extract the reduced argument of the lower double-precision (64-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from b to the upper element of dst.
_mm_reduce_round_sd
__m128d _mm_reduce_round_sd(__m128d a, __m128d b, int imm, int rounding)
CPUID Flags: AVX512DQ
Instruction(s): vreducesd
Extract the reduced argument of the lower double-precision (64-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value, and copy the upper element from b to the upper element of dst.
_mm_reduce_sd
__m128d _mm_reduce_sd(__m128d a, __m128d b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vreducesd
Extract the reduced argument of the lower double-precision (64-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value, and copy the upper element from b to the upper element of dst.
_mm_mask_reduce_round_ss
__m128 _mm_mask_reduce_round_ss(__m128 src, __mmask8 k, __m128 a, __m128 b, int imm, int rounding)
CPUID Flags: AVX512DQ
Instruction(s): vreducess
Extract the reduced argument of the lower single-precision (32-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from b to the upper elements of dst.
_mm_mask_reduce_ss
__m128 _mm_mask_reduce_ss(__m128 src, __mmask8 k, __m128 a, __m128 b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vreducess
Extract the reduced argument of the lower single-precision (32-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from b to the upper elements of dst.
_mm_maskz_reduce_round_ss
__m128 _mm_maskz_reduce_round_ss(__mmask8 k, __m128 a, __m128 b, int imm, int rounding)
CPUID Flags: AVX512DQ
Instruction(s): vreducess
Extract the reduced argument of the lower single-precision (32-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from b to the upper elements of dst.
_mm_maskz_reduce_ss
__m128 _mm_maskz_reduce_ss(__mmask8 k, __m128 a, __m128 b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vreducess
Extract the reduced argument of the lower single-precision (32-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from b to the upper elements of dst.
_mm_reduce_round_ss
__m128 _mm_reduce_round_ss(__m128 a, __m128 b, int imm, int rounding)
CPUID Flags: AVX512DQ
Instruction(s): vreducess
Extract the reduced argument of the lower single-precision (32-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value, and copy the upper 3 packed elements from b to the upper elements of dst.
_mm_reduce_ss
__m128 _mm_reduce_ss(__m128 a, __m128 b, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vreducess
Extract the reduced argument of the lower single-precision (32-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value, and copy the upper 3 packed elements from b to the upper elements of dst.
_mm_mask_roundscale_pd
__m128d _mm_mask_roundscale_pd(__m128d src, __mmask8 k, __m128d a, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrndscalepd
Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_roundscale_pd
__m128d _mm_maskz_roundscale_pd(__mmask8 k, __m128d a, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrndscalepd
Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_roundscale_pd
__m128d _mm_roundscale_pd(__m128d a, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrndscalepd
Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results.
_mm256_mask_roundscale_pd
__m256d _mm256_mask_roundscale_pd(__m256d src, __mmask8 k, __m256d a, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrndscalepd
Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_roundscale_pd
__m256d _mm256_maskz_roundscale_pd(__mmask8 k, __m256d a, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrndscalepd
Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_roundscale_pd
__m256d _mm256_roundscale_pd(__m256d a, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrndscalepd
Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results.
_mm_mask_roundscale_ps
__m128 _mm_mask_roundscale_ps(__m128 src, __mmask8 k, __m128 a, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrndscaleps
Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_roundscale_ps
__m128 _mm_maskz_roundscale_ps(__mmask8 k, __m128 a, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrndscaleps
Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_roundscale_ps
__m128 _mm_roundscale_ps(__m128 a, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrndscaleps
Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results.
_mm256_mask_roundscale_ps
__m256 _mm256_mask_roundscale_ps(__m256 src, __mmask8 k, __m256 a, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrndscaleps
Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_roundscale_ps
__m256 _mm256_maskz_roundscale_ps(__mmask8 k, __m256 a, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrndscaleps
Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_roundscale_ps
__m256 _mm256_roundscale_ps(__m256 a, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrndscaleps
Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results.
_mm_mask_scalef_pd
__m128d _mm_mask_scalef_pd(__m128d src, __mmask8 k, __m128d a, __m128d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscalefpd
Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_scalef_pd
__m128d _mm_maskz_scalef_pd(__mmask8 k, __m128d a, __m128d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscalefpd
Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_scalef_pd
__m128d _mm_scalef_pd(__m128d a, __m128d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscalefpd
Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and return the results.
_mm256_mask_scalef_pd
__m256d _mm256_mask_scalef_pd(__m256d src, __mmask8 k, __m256d a, __m256d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscalefpd
Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_scalef_pd
__m256d _mm256_maskz_scalef_pd(__mmask8 k, __m256d a, __m256d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscalefpd
Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_scalef_pd
__m256d _mm256_scalef_pd(__m256d a, __m256d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscalefpd
Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and return the results.
_mm_mask_scalef_ps
__m128 _mm_mask_scalef_ps(__m128 src, __mmask8 k, __m128 a, __m128 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscalefps
Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_scalef_ps
__m128 _mm_maskz_scalef_ps(__mmask8 k, __m128 a, __m128 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscalefps
Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_scalef_ps
__m128 _mm_scalef_ps(__m128 a, __m128 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscalefps
Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and return the results.
_mm256_mask_scalef_ps
__m256 _mm256_mask_scalef_ps(__m256 src, __mmask8 k, __m256 a, __m256 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscalefps
Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_scalef_ps
__m256 _mm256_maskz_scalef_ps(__mmask8 k, __m256 a, __m256 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscalefps
Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_scalef_ps
__m256 _mm256_scalef_ps(__m256 a, __m256 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscalefps
Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and return the results.
_mm256_mask_shuffle_f32x4
__m256 _mm256_mask_shuffle_f32x4(__m256 src, __mmask8 k, __m256 a, __m256 b, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vshuff32x4
Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by imm from a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_shuffle_f32x4
__m256 _mm256_maskz_shuffle_f32x4(__mmask8 k, __m256 a, __m256 b, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vshuff32x4
Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by imm from a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_shuffle_f32x4
__m256 _mm256_shuffle_f32x4(__m256 a, __m256 b, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vshuff32x4
Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by imm from a and b, and return the results.
_mm256_mask_shuffle_f64x2
__m256d _mm256_mask_shuffle_f64x2(__m256d src, __mmask8 k, __m256d a, __m256d b, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vshuff64x2
Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by imm from a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_shuffle_f64x2
__m256d _mm256_maskz_shuffle_f64x2(__mmask8 k, __m256d a, __m256d b, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vshuff64x2
Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by imm from a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_shuffle_f64x2
__m256d _mm256_shuffle_f64x2(__m256d a, __m256d b, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vshuff64x2
Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by imm from a and b, and return the results.
_mm_mask_shuffle_pd
__m128d _mm_mask_shuffle_pd(__m128d src, __mmask8 k, __m128d a, __m128d b, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vshufpd
Shuffle double-precision (64-bit) floating-point elements using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_shuffle_pd
__m128d _mm_maskz_shuffle_pd(__mmask8 k, __m128d a, __m128d b, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vshufpd
Shuffle double-precision (64-bit) floating-point elements using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_shuffle_pd
__m256d _mm256_mask_shuffle_pd(__m256d src, __mmask8 k, __m256d a, __m256d b, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vshufpd
Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_shuffle_pd
__m256d _mm256_maskz_shuffle_pd(__mmask8 k, __m256d a, __m256d b, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vshufpd
Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_shuffle_ps
__m128 _mm_mask_shuffle_ps(__m128 src, __mmask8 k, __m128 a, __m128 b, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vshufps
Shuffle single-precision (32-bit) floating-point elements in a using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_shuffle_ps
__m128 _mm_maskz_shuffle_ps(__mmask8 k, __m128 a, __m128 b, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vshufps
Shuffle single-precision (32-bit) floating-point elements in a using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_shuffle_ps
__m256 _mm256_mask_shuffle_ps(__m256 src, __mmask8 k, __m256 a, __m256 b, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vshufps
Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_shuffle_ps
__m256 _mm256_maskz_shuffle_ps(__mmask8 k, __m256 a, __m256 b, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vshufps
Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_unpackhi_pd
__m128d _mm_mask_unpackhi_pd(__m128d src, __mmask8 k, __m128d a, __m128d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vunpckhpd
Unpack and interleave double-precision (64-bit) floating-point elements from the high half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_unpackhi_pd
__m128d _mm_maskz_unpackhi_pd(__mmask8 k, __m128d a, __m128d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vunpckhpd
Unpack and interleave double-precision (64-bit) floating-point elements from the high half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_unpackhi_pd
__m256d _mm256_mask_unpackhi_pd(__m256d src, __mmask8 k, __m256d a, __m256d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vunpckhpd
Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_unpackhi_pd
__m256d _mm256_maskz_unpackhi_pd(__mmask8 k, __m256d a, __m256d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vunpckhpd
Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_unpackhi_ps
__m128 _mm_mask_unpackhi_ps(__m128 src, __mmask8 k, __m128 a, __m128 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vunpckhps
Unpack and interleave single-precision (32-bit) floating-point elements from the high half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_unpackhi_ps
__m128 _mm_maskz_unpackhi_ps(__mmask8 k, __m128 a, __m128 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vunpckhps
Unpack and interleave single-precision (32-bit) floating-point elements from the high half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_unpackhi_ps
__m256 _mm256_mask_unpackhi_ps(__m256 src, __mmask8 k, __m256 a, __m256 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vunpckhps
Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_unpackhi_ps
__m256 _mm256_maskz_unpackhi_ps(__mmask8 k, __m256 a, __m256 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vunpckhps
Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_unpacklo_pd
__m128d _mm_mask_unpacklo_pd(__m128d src, __mmask8 k, __m128d a, __m128d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vunpcklpd
Unpack and interleave double-precision (64-bit) floating-point elements from the low half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_unpacklo_pd
__m128d _mm_maskz_unpacklo_pd(__mmask8 k, __m128d a, __m128d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vunpcklpd
Unpack and interleave double-precision (64-bit) floating-point elements from the low half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_unpacklo_pd
__m256d _mm256_mask_unpacklo_pd(__m256d src, __mmask8 k, __m256d a, __m256d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vunpcklpd
Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_unpacklo_pd
__m256d _mm256_maskz_unpacklo_pd(__mmask8 k, __m256d a, __m256d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vunpcklpd
Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_unpacklo_ps
__m128 _mm_mask_unpacklo_ps(__m128 src, __mmask8 k, __m128 a, __m128 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vunpcklps
Unpack and interleave single-precision (32-bit) floating-point elements from the low half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_unpacklo_ps
__m128 _mm_maskz_unpacklo_ps(__mmask8 k, __m128 a, __m128 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vunpcklps
Unpack and interleave single-precision (32-bit) floating-point elements from the low half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_unpacklo_ps
__m256 _mm256_mask_unpacklo_ps(__m256 src, __mmask8 k, __m256 a, __m256 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vunpcklps
Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_unpacklo_ps
__m256 _mm256_maskz_unpacklo_ps(__mmask8 k, __m256 a, __m256 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vunpcklps
Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_alignr_epi32
__m128i _mm_alignr_epi32(__m128i a, __m128i b, const int count)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): valignd
Concatenate a and b into a 32-byte immediate result, shift the result right by count 32-bit elements, and store the low 16 bytes (4 elements) in the return value.
_mm_mask_alignr_epi32
__m128i _mm_mask_alignr_epi32(__m128i src, __mmask8 k, __m128i a, __m128i b, const int count)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): valignd
Concatenate a and b into a 32-byte immediate result, shift the result right by count 32-bit elements, and store the low 16 bytes (4 elements) in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_alignr_epi32
__m128i _mm_maskz_alignr_epi32(__mmask8 k, __m128i a, __m128i b, const int count)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): valignd
Concatenate a and b into a 32-byte immediate result, shift the result right by count 32-bit elements, and store the low 16 bytes (4 elements) in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_alignr_epi32
__m256i _mm256_alignr_epi32(__m256i a, __m256i b, const int count)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): valignd
Concatenate a and b into a 64-byte immediate result, shift the result right by count 32-bit elements, and store the low 32 bytes (8 elements) in the return value.
_mm256_mask_alignr_epi32
__m256i _mm256_mask_alignr_epi32(__m256i src, __mmask8 k, __m256i a, __m256i b, const int count)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): valignd
Concatenate a and b into a 64-byte immediate result, shift the result right by count 32-bit elements, and store the low 32 bytes (8 elements) in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_alignr_epi32
__m256i _mm256_maskz_alignr_epi32(__mmask8 k, __m256i a, __m256i b, const int count)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): valignd
Concatenate a and b into a 64-byte immediate result, shift the result right by count 32-bit elements, and store the low 32 bytes (8 elements) in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_alignr_epi64
__m128i _mm_alignr_epi64(__m128i a, __m128i b, const int count)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): valignq
Concatenate a and b into a 32-byte immediate result, shift the result right by count 64-bit elements, and store the low 16 bytes (2 elements) in the return value.
_mm_mask_alignr_epi64
__m128i _mm_mask_alignr_epi64(__m128i src, __mmask8 k, __m128i a, __m128i b, const int count)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): valignq
Concatenate a and b into a 32-byte immediate result, shift the result right by count 64-bit elements, and store the low 16 bytes (2 elements) in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_alignr_epi64
__m128i _mm_maskz_alignr_epi64(__mmask8 k, __m128i a, __m128i b, const int count)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): valignq
Concatenate a and b into a 32-byte immediate result, shift the result right by count 64-bit elements, and store the low 16 bytes (2 elements) in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_alignr_epi64
__m256i _mm256_alignr_epi64(__m256i a, __m256i b, const int count)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): valignq
Concatenate a and b into a 64-byte immediate result, shift the result right by count 64-bit elements, and store the low 32 bytes (4 elements) in the return value.
_mm256_mask_alignr_epi64
__m256i _mm256_mask_alignr_epi64(__m256i src, __mmask8 k, __m256i a, __m256i b, const int count)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): valignq
Concatenate a and b into a 64-byte immediate result, shift the result right by count 64-bit elements, and store the low 32 bytes (4 elements) in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_alignr_epi64
__m256i _mm256_maskz_alignr_epi64(__mmask8 k, __m256i a, __m256i b, const int count)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): valignq
Concatenate a and b into a 64-byte immediate result, shift the result right by count 64-bit elements, and store the low 32 bytes (4 elements) in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_dbsad_epu8
__m128i _mm_dbsad_epu8(__m128i a, __m128i b, int imm)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vdbpsadbw
Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value.
_mm_mask_dbsad_epu8
__m128i _mm_mask_dbsad_epu8(__m128i src, __mmask8 k, __m128i a, __m128i b, int imm)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vdbpsadbw
Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_dbsad_epu8
__m128i _mm_maskz_dbsad_epu8(__mmask8 k, __m128i a, __m128i b, int imm)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vdbpsadbw
Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_dbsad_epu8
__m256i _mm256_dbsad_epu8(__m256i a, __m256i b, int imm)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vdbpsadbw
Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value.
_mm256_mask_dbsad_epu8
__m256i _mm256_mask_dbsad_epu8(__m256i src, __mmask16 k, __m256i a, __m256i b, int imm)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vdbpsadbw
Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_dbsad_epu8
__m256i _mm256_maskz_dbsad_epu8(__mmask16 k, __m256i a, __m256i b, int imm)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vdbpsadbw
Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_dbsad_epu8
__m512i _mm512_dbsad_epu8(__m512i a, __m512i b, int imm)
CPUID Flags: AVX512BW
Instruction(s): vdbpsadbw
Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value.
_mm512_mask_dbsad_epu8
__m512i _mm512_mask_dbsad_epu8(__m512i src, __mmask32 k, __m512i a, __m512i b, int imm)
CPUID Flags: AVX512BW
Instruction(s): vdbpsadbw
Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_dbsad_epu8
__m512i _mm512_maskz_dbsad_epu8(__mmask32 k, __m512i a, __m512i b, int imm)
CPUID Flags: AVX512BW
Instruction(s): vdbpsadbw
Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_extracti32x4_epi32
__m128i _mm256_extracti32x4_epi32(__m256i a, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vextracti32x4
Extract 128 bits (composed of 4 packed 32-bit integers) from a, selected with imm, and store the result in the return value.
_mm256_mask_extracti32x4_epi32
__m128i _mm256_mask_extracti32x4_epi32(__m128i src, __mmask8 k, __m256i a, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vextracti32x4
Extract 128 bits (composed of 4 packed 32-bit integers) from a, selected with imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_extracti32x4_epi32
__m128i _mm256_maskz_extracti32x4_epi32(__mmask8 k, __m256i a, int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vextracti32x4
Extract 128 bits (composed of 4 packed 32-bit integers) from a, selected with imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_extracti32x8_epi32
__m256i _mm512_extracti32x8_epi32(__m512i a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vextracti32x8
Extract 256 bits (composed of 8 packed 32-bit integers) from a, selected with imm, and store the result in the return value.
_mm512_mask_extracti32x8_epi32
__m256i _mm512_mask_extracti32x8_epi32(__m256i src, __mmask8 k, __m512i a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vextracti32x8
Extract 256 bits (composed of 8 packed 32-bit integers) from a, selected with imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_extracti32x8_epi32
__m256i _mm512_maskz_extracti32x8_epi32(__mmask8 k, __m512i a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vextracti32x8
Extract 256 bits (composed of 8 packed 32-bit integers) from a, selected with imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_extracti64x2_epi64
__m128i _mm256_extracti64x2_epi64(__m256i a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vextracti64x2
Extract 128 bits (composed of 2 packed 64-bit integers) from a, selected with imm, and store the result in the return value.
_mm256_mask_extracti64x2_epi64
__m128i _mm256_mask_extracti64x2_epi64(__m128i src, __mmask8 k, __m256i a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vextracti64x2
Extract 128 bits (composed of 2 packed 64-bit integers) from a, selected with imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_extracti64x2_epi64
__m128i _mm256_maskz_extracti64x2_epi64(__mmask8 k, __m256i a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vextracti64x2
Extract 128 bits (composed of 2 packed 64-bit integers) from a, selected with imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_extracti64x2_epi64
__m128i _mm512_extracti64x2_epi64(__m512i a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vextracti64x2
Extract 128 bits (composed of 2 packed 64-bit integers) from a, selected with imm, and store the result in the return value.
_mm512_mask_extracti64x2_epi64
__m128i _mm512_mask_extracti64x2_epi64(__m128i src, __mmask8 k, __m512i a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vextracti64x2
Extract 128 bits (composed of 2 packed 64-bit integers) from a, selected with imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_extracti64x2_epi64
__m128i _mm512_maskz_extracti64x2_epi64(__mmask8 k, __m512i a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vextracti64x2
Extract 128 bits (composed of 2 packed 64-bit integers) from a, selected with imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_alignr_epi8
__m128i _mm_mask_alignr_epi8(__m128i src, __mmask16 k, __m128i a, __m128i b, const int count)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpalignr
Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by count bytes, and store the low 16 bytes in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_alignr_epi8
__m128i _mm_maskz_alignr_epi8(__mmask16 k, __m128i a, __m128i b, const int count)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpalignr
Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by count bytes, and store the low 16 bytes in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_alignr_epi8
__m256i _mm256_mask_alignr_epi8(__m256i src, __mmask32 k, __m256i a, __m256i b, const int count)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpalignr
Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by count bytes, and store the low 16 bytes in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_alignr_epi8
__m256i _mm256_maskz_alignr_epi8(__mmask32 k, __m256i a, __m256i b, const int count)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpalignr
Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by count bytes, and store the low 16 bytes in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_alignr_epi8
__m512i _mm512_alignr_epi8(__m512i a, __m512i b, const int count)
CPUID Flags: AVX512BW
Instruction(s): vpalignr
Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by count bytes, and store the low 16 bytes in the return value.
_mm512_mask_alignr_epi8
__m512i _mm512_mask_alignr_epi8(__m512i src, __mmask64 k, __m512i a, __m512i b, const int count)
CPUID Flags: AVX512BW
Instruction(s): vpalignr
Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by count bytes, and store the low 16 bytes in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_alignr_epi8
__m512i _mm512_maskz_alignr_epi8(__mmask64 k, __m512i a, __m512i b, const int count)
CPUID Flags: AVX512BW
Instruction(s): vpalignr
Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by count bytes, and store the low 16 bytes in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_blend_epi8
__m128i _mm_mask_blend_epi8(__mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpblendmb
Blend packed 8-bit integers from a and b using control mask k, and return the results.
_mm256_mask_blend_epi8
__m256i _mm256_mask_blend_epi8(__mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpblendmb
Blend packed 8-bit integers from a and b using control mask k, and return the results.
_mm512_mask_blend_epi8
__m512i _mm512_mask_blend_epi8(__mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpblendmb
Blend packed 8-bit integers from a and b using control mask k, and return the results.
_mm_mask_blend_epi32
__m128i _mm_mask_blend_epi32(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpblendmd
Blend packed 32-bit integers from a and b using control mask k, and return the results.
_mm256_mask_blend_epi32
__m256i _mm256_mask_blend_epi32(__mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpblendmd
Blend packed 32-bit integers from a and b using control mask k, and return the results.
_mm_mask_blend_epi64
__m128i _mm_mask_blend_epi64(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpblendmq
Blend packed 64-bit integers from a and b using control mask k, and return the results.
_mm256_mask_blend_epi64
__m256i _mm256_mask_blend_epi64(__mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpblendmq
Blend packed 64-bit integers from a and b using control mask k, and return the results.
_mm_mask_blend_epi16
__m128i _mm_mask_blend_epi16(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpblendmw
Blend packed 16-bit integers from a and b using control mask k, and return the results.
_mm256_mask_blend_epi16
__m256i _mm256_mask_blend_epi16(__mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpblendmw
Blend packed 16-bit integers from a and b using control mask k, and return the results.
_mm512_mask_blend_epi16
__m512i _mm512_mask_blend_epi16(__mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpblendmw
Blend packed 16-bit integers from a and b using control mask k, and return the results.
_mm_mask_broadcastb_epi8
__m128i _mm_mask_broadcastb_epi8(__m128i src, __mmask16 k, __m128i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpbroadcastb
Broadcast the low packed 8-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_broadcastb_epi8
__m128i _mm_maskz_broadcastb_epi8(__mmask16 k, __m128i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpbroadcastb
Broadcast the low packed 8-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_broadcastb_epi8
__m256i _mm256_mask_broadcastb_epi8(__m256i src, __mmask32 k, __m128i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpbroadcastb
Broadcast the low packed 8-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_broadcastb_epi8
__m256i _mm256_maskz_broadcastb_epi8(__mmask32 k, __m128i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpbroadcastb
Broadcast the low packed 8-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_broadcastb_epi8
__m512i _mm512_broadcastb_epi8(__m128i a)
CPUID Flags: AVX512BW
Instruction(s): vpbroadcastb
Broadcast the low packed 8-bit integer from a to all elements of the return value.
_mm512_mask_broadcastb_epi8
__m512i _mm512_mask_broadcastb_epi8(__m512i src, __mmask64 k, __m128i a)
CPUID Flags: AVX512BW
Instruction(s): vpbroadcastb
Broadcast the low packed 8-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_broadcastb_epi8
__m512i _mm512_maskz_broadcastb_epi8(__mmask64 k, __m128i a)
CPUID Flags: AVX512BW
Instruction(s): vpbroadcastb
Broadcast the low packed 8-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_broadcastd_epi32
__m128i _mm_mask_broadcastd_epi32(__m128i src, __mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpbroadcastd
Broadcast the low packed 32-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_broadcastd_epi32
__m128i _mm_maskz_broadcastd_epi32(__mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpbroadcastd
Broadcast the low packed 32-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_broadcastd_epi32
__m256i _mm256_mask_broadcastd_epi32(__m256i src, __mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpbroadcastd
Broadcast the low packed 32-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_broadcastd_epi32
__m256i _mm256_maskz_broadcastd_epi32(__mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpbroadcastd
Broadcast the low packed 32-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_broadcastmb_epi64
__m128i _mm_broadcastmb_epi64(__mmask8 k)
CPUID Flags: AVX512CD, AVX512VL
Instruction(s): vpbroadcastmb2q
Broadcast the low 8-bits from input mask k to all 64-bit elements of the return value.
_mm256_broadcastmb_epi64
__m256i _mm256_broadcastmb_epi64(__mmask8 k)
CPUID Flags: AVX512CD, AVX512VL
Instruction(s): vpbroadcastmb2q
Broadcast the low 8-bits from input mask k to all 64-bit elements of the return value.
_mm_broadcastmw_epi32
__m128i _mm_broadcastmw_epi32(__mmask16 k)
CPUID Flags: AVX512CD, AVX512VL
Instruction(s): vpbroadcastmw2d
Broadcast the low 16-bits from input mask k to all 32-bit elements of the return value.
_mm256_broadcastmw_epi32
__m256i _mm256_broadcastmw_epi32(__mmask16 k)
CPUID Flags: AVX512CD, AVX512VL
Instruction(s): vpbroadcastmw2d
Broadcast the low 16-bits from input mask k to all 32-bit elements of the return value.
_mm_mask_broadcastq_epi64
__m128i _mm_mask_broadcastq_epi64(__m128i src, __mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpbroadcastq
Broadcast the low packed 64-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_broadcastq_epi64
__m128i _mm_maskz_broadcastq_epi64(__mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpbroadcastq
Broadcast the low packed 64-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_broadcastq_epi64
__m256i _mm256_mask_broadcastq_epi64(__m256i src, __mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpbroadcastq
Broadcast the low packed 64-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_broadcastq_epi64
__m256i _mm256_maskz_broadcastq_epi64(__mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpbroadcastq
Broadcast the low packed 64-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_broadcastw_epi16
__m128i _mm_mask_broadcastw_epi16(__m128i src, __mmask8 k, __m128i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpbroadcastw
Broadcast the low packed 16-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_broadcastw_epi16
__m128i _mm_maskz_broadcastw_epi16(__mmask8 k, __m128i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpbroadcastw
Broadcast the low packed 16-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_broadcastw_epi16
__m256i _mm256_mask_broadcastw_epi16(__m256i src, __mmask16 k, __m128i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpbroadcastw
Broadcast the low packed 16-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_broadcastw_epi16
__m256i _mm256_maskz_broadcastw_epi16(__mmask16 k, __m128i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpbroadcastw
Broadcast the low packed 16-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_broadcastw_epi16
__m512i _mm512_broadcastw_epi16(__m128i a)
CPUID Flags: AVX512BW
Instruction(s): vpbroadcastw
Broadcast the low packed 16-bit integer from a to all elements of the return value.
_mm512_mask_broadcastw_epi16
__m512i _mm512_mask_broadcastw_epi16(__m512i src, __mmask32 k, __m128i a)
CPUID Flags: AVX512BW
Instruction(s): vpbroadcastw
Broadcast the low packed 16-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_broadcastw_epi16
__m512i _mm512_maskz_broadcastw_epi16(__mmask32 k, __m128i a)
CPUID Flags: AVX512BW
Instruction(s): vpbroadcastw
Broadcast the low packed 16-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_compress_epi32
__m128i _mm_mask_compress_epi32(__m128i src, __mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpcompressd
Contiguously store the active 32-bit integers in a (those with their respective bit set in writemask k) to the return value, and pass through the remaining elements from src.
_mm_maskz_compress_epi32
__m128i _mm_maskz_compress_epi32(__mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpcompressd
Contiguously store the active 32-bit integers in a (those with their respective bit set in zeromask k) to the return value, and set the remaining elements to zero.
_mm256_mask_compress_epi32
__m256i _mm256_mask_compress_epi32(__m256i src, __mmask8 k, __m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpcompressd
Contiguously store the active 32-bit integers in a (those with their respective bit set in writemask k) to the return value, and pass through the remaining elements from src.
_mm256_maskz_compress_epi32
__m256i _mm256_maskz_compress_epi32(__mmask8 k, __m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpcompressd
Contiguously store the active 32-bit integers in a (those with their respective bit set in zeromask k) to the return value, and set the remaining elements to zero.
_mm_mask_compress_epi64
__m128i _mm_mask_compress_epi64(__m128i src, __mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpcompressq
Contiguously store the active 64-bit integers in a (those with their respective bit set in writemask k) to the return value, and pass through the remaining elements from src.
_mm_maskz_compress_epi64
__m128i _mm_maskz_compress_epi64(__mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpcompressq
Contiguously store the active 64-bit integers in a (those with their respective bit set in zeromask k) to the return value, and set the remaining elements to zero.
_mm256_mask_compress_epi64
__m256i _mm256_mask_compress_epi64(__m256i src, __mmask8 k, __m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpcompressq
Contiguously store the active 64-bit integers in a (those with their respective bit set in writemask k) to the return value, and pass through the remaining elements from src.
_mm256_maskz_compress_epi64
__m256i _mm256_maskz_compress_epi64(__mmask8 k, __m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpcompressq
Contiguously store the active 64-bit integers in a (those with their respective bit set in zeromask k) to the return value, and set the remaining elements to zero.
_mm256_mask_permutexvar_epi32
__m256i _mm256_mask_permutexvar_epi32(__m256i src, __mmask8 k, __m256i idx, __m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermd
Shuffle 32-bit integers in a across lanes using the corresponding index in idx, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_permutexvar_epi32
__m256i _mm256_maskz_permutexvar_epi32(__mmask8 k, __m256i idx, __m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermd
Shuffle 32-bit integers in a across lanes using the corresponding index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_permutexvar_epi32
__m256i _mm256_permutexvar_epi32(__m256i idx, __m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermd
Shuffle 32-bit integers in a across lanes using the corresponding index in idx, and return the results.
_mm_mask2_permutex2var_epi32
__m128i _mm_mask2_permutex2var_epi32(__m128i a, __m128i idx, __mmask8 k, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermi2d
Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).
_mm256_mask2_permutex2var_epi32
__m256i _mm256_mask2_permutex2var_epi32(__m256i a, __m256i idx, __mmask8 k, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermi2d
Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).
_mm_maskz_permutex2var_epi32
__m128i _mm_maskz_permutex2var_epi32(__mmask8 k, __m128i a, __m128i idx, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermi2d, vpermt2d
Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_permutex2var_epi32
__m128i _mm_permutex2var_epi32(__m128i a, __m128i idx, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermi2d, vpermt2d
Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results.
_mm256_maskz_permutex2var_epi32
__m256i _mm256_maskz_permutex2var_epi32(__mmask8 k, __m256i a, __m256i idx, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermi2d, vpermt2d
Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_permutex2var_epi32
__m256i _mm256_permutex2var_epi32(__m256i a, __m256i idx, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermi2d, vpermt2d
Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results.
_mm_mask2_permutex2var_epi64
__m128i _mm_mask2_permutex2var_epi64(__m128i a, __m128i idx, __mmask8 k, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermi2q
Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).
_mm256_mask2_permutex2var_epi64
__m256i _mm256_mask2_permutex2var_epi64(__m256i a, __m256i idx, __mmask8 k, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermi2q
Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).
_mm_maskz_permutex2var_epi64
__m128i _mm_maskz_permutex2var_epi64(__mmask8 k, __m128i a, __m128i idx, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermi2q, vpermt2q
Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_permutex2var_epi64
__m128i _mm_permutex2var_epi64(__m128i a, __m128i idx, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermi2q, vpermt2q
Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results.
_mm256_maskz_permutex2var_epi64
__m256i _mm256_maskz_permutex2var_epi64(__mmask8 k, __m256i a, __m256i idx, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermi2q, vpermt2q
Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_permutex2var_epi64
__m256i _mm256_permutex2var_epi64(__m256i a, __m256i idx, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermi2q, vpermt2q
Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results.
_mm_mask2_permutex2var_epi16
__m128i _mm_mask2_permutex2var_epi16(__m128i a, __m128i idx, __mmask8 k, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpermi2w
Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).
_mm256_mask2_permutex2var_epi16
__m256i _mm256_mask2_permutex2var_epi16(__m256i a, __m256i idx, __mmask16 k, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpermi2w
Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).
_mm512_mask2_permutex2var_epi16
__m512i _mm512_mask2_permutex2var_epi16(__m512i a, __m512i idx, __mmask32 k, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpermi2w
Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).
_mm_maskz_permutex2var_epi16
__m128i _mm_maskz_permutex2var_epi16(__mmask8 k, __m128i a, __m128i idx, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpermi2w, vpermt2w
Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_permutex2var_epi16
__m128i _mm_permutex2var_epi16(__m128i a, __m128i idx, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpermi2w, vpermt2w
Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results.
_mm256_maskz_permutex2var_epi16
__m256i _mm256_maskz_permutex2var_epi16(__mmask16 k, __m256i a, __m256i idx, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpermi2w, vpermt2w
Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_permutex2var_epi16
__m256i _mm256_permutex2var_epi16(__m256i a, __m256i idx, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpermi2w, vpermt2w
Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results.
_mm512_maskz_permutex2var_epi16
__m512i _mm512_maskz_permutex2var_epi16(__mmask32 k, __m512i a, __m512i idx, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpermi2w, vpermt2w
Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_permutex2var_epi16
__m512i _mm512_permutex2var_epi16(__m512i a, __m512i idx, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpermi2w, vpermt2w
Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results.
_mm256_mask_permutex_epi64
__m256i _mm256_mask_permutex_epi64(__m256i src, __mmask8 k, __m256i a, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermq
Shuffle 64-bit integers in a across lanes lanes using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_mask_permutexvar_epi64
__m256i _mm256_mask_permutexvar_epi64(__m256i src, __mmask8 k, __m256i idx, __m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermq
Shuffle 64-bit integers in a across lanes using the corresponding index in idx, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_permutex_epi64
__m256i _mm256_maskz_permutex_epi64(__mmask8 k, __m256i a, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermq
Shuffle 64-bit integers in a across lanes using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_maskz_permutexvar_epi64
__m256i _mm256_maskz_permutexvar_epi64(__mmask8 k, __m256i idx, __m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermq
Shuffle 64-bit integers in a across lanes using the corresponding index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_permutex_epi64
__m256i _mm256_permutex_epi64(__m256i a, const int imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermq
Shuffle 64-bit integers in a across lanes using the control in imm, and return the results.
_mm256_permutexvar_epi64
__m256i _mm256_permutexvar_epi64(__m256i idx, __m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermq
Shuffle 64-bit integers in a across lanes using the corresponding index in idx, and return the results.
_mm_mask_permutex2var_epi32
__m128i _mm_mask_permutex2var_epi32(__m128i a, __mmask8 k, __m128i idx, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermt2d
Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm256_mask_permutex2var_epi32
__m256i _mm256_mask_permutex2var_epi32(__m256i a, __mmask8 k, __m256i idx, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermt2d
Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm_mask_permutex2var_epi64
__m128i _mm_mask_permutex2var_epi64(__m128i a, __mmask8 k, __m128i idx, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermt2q
Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm256_mask_permutex2var_epi64
__m256i _mm256_mask_permutex2var_epi64(__m256i a, __mmask8 k, __m256i idx, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpermt2q
Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm_mask_permutex2var_epi16
__m128i _mm_mask_permutex2var_epi16(__m128i a, __mmask8 k, __m128i idx, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpermt2w
Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm256_mask_permutex2var_epi16
__m256i _mm256_mask_permutex2var_epi16(__m256i a, __mmask16 k, __m256i idx, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpermt2w
Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask_permutex2var_epi16
__m512i _mm512_mask_permutex2var_epi16(__m512i a, __mmask32 k, __m512i idx, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpermt2w
Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm_mask_permutexvar_epi16
__m128i _mm_mask_permutexvar_epi16(__m128i src, __mmask8 k, __m128i idx, __m128i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpermw
Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_permutexvar_epi16
__m128i _mm_maskz_permutexvar_epi16(__mmask8 k, __m128i idx, __m128i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpermw
Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_permutexvar_epi16
__m128i _mm_permutexvar_epi16(__m128i idx, __m128i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpermw
Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results.
_mm256_mask_permutexvar_epi16
__m256i _mm256_mask_permutexvar_epi16(__m256i src, __mmask16 k, __m256i idx, __m256i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpermw
Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_permutexvar_epi16
__m256i _mm256_maskz_permutexvar_epi16(__mmask16 k, __m256i idx, __m256i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpermw
Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_permutexvar_epi16
__m256i _mm256_permutexvar_epi16(__m256i idx, __m256i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpermw
Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results.
_mm512_mask_permutexvar_epi16
__m512i _mm512_mask_permutexvar_epi16(__m512i src, __mmask32 k, __m512i idx, __m512i a)
CPUID Flags: AVX512BW
Instruction(s): vpermw
Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_permutexvar_epi16
__m512i _mm512_maskz_permutexvar_epi16(__mmask32 k, __m512i idx, __m512i a)
CPUID Flags: AVX512BW
Instruction(s): vpermw
Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_permutexvar_epi16
__m512i _mm512_permutexvar_epi16(__m512i idx, __m512i a)
CPUID Flags: AVX512BW
Instruction(s): vpermw
Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results.
_mm_mask_expand_epi32
__m128i _mm_mask_expand_epi32(__m128i src, __mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpexpandd
Load contiguous active 32-bit integers from a (those with their respective bit set in mask k), and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_expand_epi32
__m128i _mm_maskz_expand_epi32(__mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpexpandd
Load contiguous active 32-bit integers from a (those with their respective bit set in mask k), and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_expand_epi32
__m256i _mm256_mask_expand_epi32(__m256i src, __mmask8 k, __m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpexpandd
Load contiguous active 32-bit integers from a (those with their respective bit set in mask k), and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_expand_epi32
__m256i _mm256_maskz_expand_epi32(__mmask8 k, __m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpexpandd
Load contiguous active 32-bit integers from a (those with their respective bit set in mask k), and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_expand_epi64
__m128i _mm_mask_expand_epi64(__m128i src, __mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpexpandq
Load contiguous active 64-bit integers from a (those with their respective bit set in mask k), and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_expand_epi64
__m128i _mm_maskz_expand_epi64(__mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpexpandq
Load contiguous active 64-bit integers from a (those with their respective bit set in mask k), and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_expand_epi64
__m256i _mm256_mask_expand_epi64(__m256i src, __mmask8 k, __m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpexpandq
Load contiguous active 64-bit integers from a (those with their respective bit set in mask k), and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_expand_epi64
__m256i _mm256_maskz_expand_epi64(__mmask8 k, __m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpexpandq
Load contiguous active 64-bit integers from a (those with their respective bit set in mask k), and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_movm_epi8
__m128i _mm_movm_epi8(__mmask16 k)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmovm2b
Set each packed 8-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.
_mm256_movm_epi8
__m256i _mm256_movm_epi8(__mmask32 k)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmovm2b
Set each packed 8-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.
_mm512_movm_epi8
__m512i _mm512_movm_epi8(__mmask64 k)
CPUID Flags: AVX512BW
Instruction(s): vpmovm2b
Set each packed 8-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.
_mm_movm_epi32
__m128i _mm_movm_epi32(__mmask8 k)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vpmovm2d
Set each packed 32-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.
_mm256_movm_epi32
__m256i _mm256_movm_epi32(__mmask8 k)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vpmovm2d
Set each packed 32-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.
_mm512_movm_epi32
__m512i _mm512_movm_epi32(__mmask16 k)
CPUID Flags: AVX512DQ
Instruction(s): vpmovm2d
Set each packed 32-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.
_mm_movm_epi64
__m128i _mm_movm_epi64(__mmask8 k)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vpmovm2q
Set each packed 64-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.
_mm256_movm_epi64
__m256i _mm256_movm_epi64(__mmask8 k)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vpmovm2q
Set each packed 64-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.
_mm512_movm_epi64
__m512i _mm512_movm_epi64(__mmask8 k)
CPUID Flags: AVX512DQ
Instruction(s): vpmovm2q
Set each packed 64-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.
_mm_movm_epi16
__m128i _mm_movm_epi16(__mmask8 k)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmovm2w
Set each packed 16-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.
_mm256_movm_epi16
__m256i _mm256_movm_epi16(__mmask16 k)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmovm2w
Set each packed 16-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.
_mm512_movm_epi16
__m512i _mm512_movm_epi16(__mmask32 k)
CPUID Flags: AVX512BW
Instruction(s): vpmovm2w
Set each packed 16-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.
_mm512_sad_epu8
__m512i _mm512_sad_epu8(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpsadbw
Compute the absolute differences of packed unsigned 8-bit integers in a and b, then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in the return value.
_mm_mask_shuffle_epi8
__m128i _mm_mask_shuffle_epi8(__m128i src, __mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpshufb
Shuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_shuffle_epi8
__m128i _mm_maskz_shuffle_epi8(__mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpshufb
Shuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_shuffle_epi8
__m256i _mm256_mask_shuffle_epi8(__m256i src, __mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpshufb
Shuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_shuffle_epi8
__m256i _mm256_maskz_shuffle_epi8(__mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpshufb
Shuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_shuffle_epi8
__m512i _mm512_mask_shuffle_epi8(__m512i src, __mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpshufb
Shuffle 8-bit integers in a within 128-bit lanes using the control in the corresponding 8-bit element of b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_shuffle_epi8
__m512i _mm512_maskz_shuffle_epi8(__mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpshufb
Shuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_shuffle_epi8
__m512i _mm512_shuffle_epi8(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpshufb
Shuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and return the results.
_mm_mask_shuffle_epi32
__m128i _mm_mask_shuffle_epi32(__m128i src, __mmask8 k, __m128i a, _MM_PERM_ENUM imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpshufd
Shuffle 32-bit integers in a using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_shuffle_epi32
__m128i _mm_maskz_shuffle_epi32(__mmask8 k, __m128i a, _MM_PERM_ENUM imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpshufd
Shuffle 32-bit integers in a using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_shuffle_epi32
__m256i _mm256_mask_shuffle_epi32(__m256i src, __mmask8 k, __m256i a, _MM_PERM_ENUM imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpshufd
Shuffle 32-bit integers in a within 128-bit lanes using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_shuffle_epi32
__m256i _mm256_maskz_shuffle_epi32(__mmask8 k, __m256i a, _MM_PERM_ENUM imm)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpshufd
Shuffle 32-bit integers in a within 128-bit lanes using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_shufflehi_epi16
__m128i _mm_mask_shufflehi_epi16(__m128i src, __mmask8 k, __m128i a, int imm)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpshufhw
Shuffle 16-bit integers in the high 64 bits of a using the control in imm. Store the results in the high 64 bits of the return value, with the low 64 bits being copied from from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_shufflehi_epi16
__m128i _mm_maskz_shufflehi_epi16(__mmask8 k, __m128i a, int imm)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpshufhw
Shuffle 16-bit integers in the high 64 bits of a using the control in imm. Store the results in the high 64 bits of the return value, with the low 64 bits being copied from from a to dst, using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_shufflehi_epi16
__m256i _mm256_mask_shufflehi_epi16(__m256i src, __mmask16 k, __m256i a, int imm)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpshufhw
Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm. Store the results in the high 64 bits of 128-bit lanes of the return value, with the low 64 bits of 128-bit lanes being copied from from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_shufflehi_epi16
__m256i _mm256_maskz_shufflehi_epi16(__mmask16 k, __m256i a, int imm)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpshufhw
Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm. Store the results in the high 64 bits of 128-bit lanes of the return value, with the low 64 bits of 128-bit lanes being copied from from a to dst, using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_shufflehi_epi16
__m512i _mm512_mask_shufflehi_epi16(__m512i src, __mmask32 k, __m512i a, int imm)
CPUID Flags: AVX512BW
Instruction(s): vpshufhw
Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm. Store the results in the high 64 bits of 128-bit lanes of the return value, with the low 64 bits of 128-bit lanes being copied from from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_shufflehi_epi16
__m512i _mm512_maskz_shufflehi_epi16(__mmask32 k, __m512i a, int imm)
CPUID Flags: AVX512BW
Instruction(s): vpshufhw
Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm. Store the results in the high 64 bits of 128-bit lanes of the return value, with the low 64 bits of 128-bit lanes being copied from from a to dst, using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_shufflehi_epi16
__m512i _mm512_shufflehi_epi16(__m512i a, int imm)
CPUID Flags: AVX512BW
Instruction(s): vpshufhw
Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm. Store the results in the high 64 bits of 128-bit lanes of the return value, with the low 64 bits of 128-bit lanes being copied from from a to dst.
_mm_mask_shufflelo_epi16
__m128i _mm_mask_shufflelo_epi16(__m128i src, __mmask8 k, __m128i a, int imm)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpshuflw
Shuffle 16-bit integers in the low 64 bits of a using the control in imm. Store the results in the low 64 bits of the return value, with the high 64 bits being copied from from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_shufflelo_epi16
__m128i _mm_maskz_shufflelo_epi16(__mmask8 k, __m128i a, int imm)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpshuflw
Shuffle 16-bit integers in the low 64 bits of a using the control in imm. Store the results in the low 64 bits of the return value, with the high 64 bits being copied from from a to dst, using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_shufflelo_epi16
__m256i _mm256_mask_shufflelo_epi16(__m256i src, __mmask16 k, __m256i a, int imm)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpshuflw
Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm. Store the results in the low 64 bits of 128-bit lanes of the return value, with the high 64 bits of 128-bit lanes being copied from from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_shufflelo_epi16
__m256i _mm256_maskz_shufflelo_epi16(__mmask16 k, __m256i a, int imm)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpshuflw
Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm. Store the results in the low 64 bits of 128-bit lanes of the return value, with the high 64 bits of 128-bit lanes being copied from from a to dst, using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_shufflelo_epi16
__m512i _mm512_mask_shufflelo_epi16(__m512i src, __mmask32 k, __m512i a, int imm)
CPUID Flags: AVX512BW
Instruction(s): vpshuflw
Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm. Store the results in the low 64 bits of 128-bit lanes of the return value, with the high 64 bits of 128-bit lanes being copied from from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_shufflelo_epi16
__m512i _mm512_maskz_shufflelo_epi16(__mmask32 k, __m512i a, int imm)
CPUID Flags: AVX512BW
Instruction(s): vpshuflw
Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm. Store the results in the low 64 bits of 128-bit lanes of the return value, with the high 64 bits of 128-bit lanes being copied from from a to dst, using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_shufflelo_epi16
__m512i _mm512_shufflelo_epi16(__m512i a, int imm)
CPUID Flags: AVX512BW
Instruction(s): vpshuflw
Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm. Store the results in the low 64 bits of 128-bit lanes of the return value, with the high 64 bits of 128-bit lanes being copied from from a to dst.
_mm_mask_unpackhi_epi8
__m128i _mm_mask_unpackhi_epi8(__m128i src, __mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpunpckhbw
Unpack and interleave 8-bit integers from the high half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_unpackhi_epi8
__m128i _mm_maskz_unpackhi_epi8(__mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpunpckhbw
Unpack and interleave 8-bit integers from the high half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_unpackhi_epi8
__m256i _mm256_mask_unpackhi_epi8(__m256i src, __mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpunpckhbw
Unpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_unpackhi_epi8
__m256i _mm256_maskz_unpackhi_epi8(__mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpunpckhbw
Unpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_unpackhi_epi8
__m512i _mm512_mask_unpackhi_epi8(__m512i src, __mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpunpckhbw
Unpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_unpackhi_epi8
__m512i _mm512_maskz_unpackhi_epi8(__mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpunpckhbw
Unpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_unpackhi_epi8
__m512i _mm512_unpackhi_epi8(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpunpckhbw
Unpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and return the results.
_mm_mask_unpackhi_epi32
__m128i _mm_mask_unpackhi_epi32(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpunpckhdq
Unpack and interleave 32-bit integers from the high half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_unpackhi_epi32
__m128i _mm_maskz_unpackhi_epi32(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpunpckhdq
Unpack and interleave 32-bit integers from the high half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_unpackhi_epi32
__m256i _mm256_mask_unpackhi_epi32(__m256i src, __mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpunpckhdq
Unpack and interleave 32-bit integers from the high half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_unpackhi_epi32
__m256i _mm256_maskz_unpackhi_epi32(__mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpunpckhdq
Unpack and interleave 32-bit integers from the high half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_unpackhi_epi64
__m128i _mm_mask_unpackhi_epi64(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpunpckhqdq
Unpack and interleave 64-bit integers from the high half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_unpackhi_epi64
__m128i _mm_maskz_unpackhi_epi64(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpunpckhqdq
Unpack and interleave 64-bit integers from the high half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_unpackhi_epi64
__m256i _mm256_mask_unpackhi_epi64(__m256i src, __mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpunpckhqdq
Unpack and interleave 64-bit integers from the high half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_unpackhi_epi64
__m256i _mm256_maskz_unpackhi_epi64(__mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpunpckhqdq
Unpack and interleave 64-bit integers from the high half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_unpackhi_epi16
__m128i _mm_mask_unpackhi_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpunpckhwd
Unpack and interleave 16-bit integers from the high half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_unpackhi_epi16
__m128i _mm_maskz_unpackhi_epi16(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpunpckhwd
Unpack and interleave 16-bit integers from the high half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_unpackhi_epi16
__m256i _mm256_mask_unpackhi_epi16(__m256i src, __mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpunpckhwd
Unpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_unpackhi_epi16
__m256i _mm256_maskz_unpackhi_epi16(__mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpunpckhwd
Unpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_unpackhi_epi16
__m512i _mm512_mask_unpackhi_epi16(__m512i src, __mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpunpckhwd
Unpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_unpackhi_epi16
__m512i _mm512_maskz_unpackhi_epi16(__mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpunpckhwd
Unpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_unpackhi_epi16
__m512i _mm512_unpackhi_epi16(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpunpckhwd
Unpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and return the results.
_mm_mask_unpacklo_epi8
__m128i _mm_mask_unpacklo_epi8(__m128i src, __mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpunpcklbw
Unpack and interleave 8-bit integers from the low half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_unpacklo_epi8
__m128i _mm_maskz_unpacklo_epi8(__mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpunpcklbw
Unpack and interleave 8-bit integers from the low half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_unpacklo_epi8
__m256i _mm256_mask_unpacklo_epi8(__m256i src, __mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpunpcklbw
Unpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_unpacklo_epi8
__m256i _mm256_maskz_unpacklo_epi8(__mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpunpcklbw
Unpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_unpacklo_epi8
__m512i _mm512_mask_unpacklo_epi8(__m512i src, __mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpunpcklbw
Unpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_unpacklo_epi8
__m512i _mm512_maskz_unpacklo_epi8(__mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpunpcklbw
Unpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_unpacklo_epi8
__m512i _mm512_unpacklo_epi8(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpunpcklbw
Unpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and return the results.
_mm_mask_unpacklo_epi32
__m128i _mm_mask_unpacklo_epi32(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpunpckldq
Unpack and interleave 32-bit integers from the low half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_unpacklo_epi32
__m128i _mm_maskz_unpacklo_epi32(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpunpckldq
Unpack and interleave 32-bit integers from the low half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_unpacklo_epi32
__m256i _mm256_mask_unpacklo_epi32(__m256i src, __mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpunpckldq
Unpack and interleave 32-bit integers from the low half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_unpacklo_epi32
__m256i _mm256_maskz_unpacklo_epi32(__mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpunpckldq
Unpack and interleave 32-bit integers from the low half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_unpacklo_epi64
__m128i _mm_mask_unpacklo_epi64(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpunpcklqdq
Unpack and interleave 64-bit integers from the low half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_unpacklo_epi64
__m128i _mm_maskz_unpacklo_epi64(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpunpcklqdq
Unpack and interleave 64-bit integers from the low half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_unpacklo_epi64
__m256i _mm256_mask_unpacklo_epi64(__m256i src, __mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpunpcklqdq
Unpack and interleave 64-bit integers from the low half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_unpacklo_epi64
__m256i _mm256_maskz_unpacklo_epi64(__mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpunpcklqdq
Unpack and interleave 64-bit integers from the low half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_unpacklo_epi16
__m128i _mm_mask_unpacklo_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpunpcklwd
Unpack and interleave 16-bit integers from the low half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_unpacklo_epi16
__m128i _mm_maskz_unpacklo_epi16(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpunpcklwd
Unpack and interleave 16-bit integers from the low half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_unpacklo_epi16
__m256i _mm256_mask_unpacklo_epi16(__m256i src, __mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpunpcklwd
Unpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_unpacklo_epi16
__m256i _mm256_maskz_unpacklo_epi16(__mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpunpcklwd
Unpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_unpacklo_epi16
__m512i _mm512_mask_unpacklo_epi16(__m512i src, __mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpunpcklwd
Unpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_unpacklo_epi16
__m512i _mm512_maskz_unpacklo_epi16(__mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpunpcklwd
Unpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_unpacklo_epi16
__m512i _mm512_unpacklo_epi16(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpunpcklwd
Unpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and return the results.
_mm512_kunpackd
__mmask64 _mm512_kunpackd(__mmask64 a, __mmask64 b)
CPUID Flags: AVX512BW
Instruction(s): kunpckdq
Unpack and interleave 32 bits from masks a and b, and return the 64-bit result.
_mm512_kunpackw
__mmask32 _mm512_kunpackw(__mmask32 a, __mmask32 b)
CPUID Flags: AVX512BW
Instruction(s): kunpckwd
Unpack and interleave 16 bits from masks a and b, and store the 32-bit result in k.
_mm_fpclass_pd_mask
__mmask8 _mm_fpclass_pd_mask(__m128d a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vfpclasspd
Test packed double-precision (64-bit) floating-point elements in a for special categories specified by imm, and and put each result in the corresponding bit of the returned mask value.
_mm_mask_fpclass_pd_mask
__mmask8 _mm_mask_fpclass_pd_mask(__mmask8 k1, __m128d a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vfpclasspd
Test packed double-precision (64-bit) floating-point elements in a for special categories specified by imm, and and put each result in the corresponding bit of the returned mask value using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
_mm256_fpclass_pd_mask
__mmask8 _mm256_fpclass_pd_mask(__m256d a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vfpclasspd
Test packed double-precision (64-bit) floating-point elements in a for special categories specified by imm, and and put each result in the corresponding bit of the returned mask value.
_mm256_mask_fpclass_pd_mask
__mmask8 _mm256_mask_fpclass_pd_mask(__mmask8 k1, __m256d a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vfpclasspd
Test packed double-precision (64-bit) floating-point elements in a for special categories specified by imm, and and put each result in the corresponding bit of the returned mask value using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
_mm512_fpclass_pd_mask
__mmask8 _mm512_fpclass_pd_mask(__m512d a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vfpclasspd
Test packed double-precision (64-bit) floating-point elements in a for special categories specified by imm, and and put each result in the corresponding bit of the returned mask value.
_mm512_mask_fpclass_pd_mask
__mmask8 _mm512_mask_fpclass_pd_mask(__mmask8 k1, __m512d a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vfpclasspd
Test packed double-precision (64-bit) floating-point elements in a for special categories specified by imm, and and put each result in the corresponding bit of the returned mask value using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
_mm_fpclass_ps_mask
__mmask8 _mm_fpclass_ps_mask(__m128 a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vfpclassps
Test packed single-precision (32-bit) floating-point elements in a for special categories specified by imm, and and put each result in the corresponding bit of the returned mask value.
_mm_mask_fpclass_ps_mask
__mmask8 _mm_mask_fpclass_ps_mask(__mmask8 k1, __m128 a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vfpclassps
Test packed single-precision (32-bit) floating-point elements in a for special categories specified by imm, and and put each result in the corresponding bit of the returned mask value using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
_mm256_fpclass_ps_mask
__mmask8 _mm256_fpclass_ps_mask(__m256 a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vfpclassps
Test packed single-precision (32-bit) floating-point elements in a for special categories specified by imm, and and put each result in the corresponding bit of the returned mask value.
_mm256_mask_fpclass_ps_mask
__mmask8 _mm256_mask_fpclass_ps_mask(__mmask8 k1, __m256 a, int imm)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vfpclassps
Test packed single-precision (32-bit) floating-point elements in a for special categories specified by imm, and and put each result in the corresponding bit of the returned mask value using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
_mm512_fpclass_ps_mask
__mmask16 _mm512_fpclass_ps_mask(__m512 a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vfpclassps
Test packed single-precision (32-bit) floating-point elements in a for special categories specified by imm, and and put each result in the corresponding bit of the returned mask value.
_mm512_mask_fpclass_ps_mask
__mmask16 _mm512_mask_fpclass_ps_mask(__mmask16 k1, __m512 a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vfpclassps
Test packed single-precision (32-bit) floating-point elements in a for special categories specified by imm, and and put each result in the corresponding bit of the returned mask value using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
_mm_fpclass_sd_mask
__mmask8 _mm_fpclass_sd_mask(__m128d a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vfpclasssd
Test the lower double-precision (64-bit) floating-point element in a for special categories specified by imm, and and put the result in the returned mask value.
_mm_mask_fpclass_sd_mask
__mmask8 _mm_mask_fpclass_sd_mask(__mmask8 k1, __m128d a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vfpclasssd
Test the lower double-precision (64-bit) floating-point element in a for special categories specified by imm, and and put the result in the returned mask value using zeromask k1 (the element is zeroed out when mask bit 0 is not set).
_mm_fpclass_ss_mask
__mmask8 _mm_fpclass_ss_mask(__m128 a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vfpclassss
Test the lower single-precision (32-bit) floating-point element in a for special categories specified by imm, and store the result in mask vector "k.
_mm_mask_fpclass_ss_mask
__mmask8 _mm_mask_fpclass_ss_mask(__mmask8 k1, __m128 a, int imm)
CPUID Flags: AVX512DQ
Instruction(s): vfpclassss
Test the lower single-precision (32-bit) floating-point element in a for special categories specified by imm, and and put the result in the returned mask value using zeromask k1 (the element is zeroed out when mask bit 0 is not set).
_mm_movepi8_mask
__mmask16 _mm_movepi8_mask(__m128i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmovb2m
Set each bit of the returned mask value based on the most significant bit of the corresponding packed 8-bit integer in a.
_mm256_movepi8_mask
__mmask32 _mm256_movepi8_mask(__m256i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmovb2m
Set each bit of the returned mask value based on the most significant bit of the corresponding packed 8-bit integer in a.
_mm512_movepi8_mask
__mmask64 _mm512_movepi8_mask(__m512i a)
CPUID Flags: AVX512BW
Instruction(s): vpmovb2m
Set each bit of the returned mask value based on the most significant bit of the corresponding packed 8-bit integer in a.
_mm_movepi32_mask
__mmask8 _mm_movepi32_mask(__m128i a)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vpmovd2m
Set each bit of the returned mask value based on the most significant bit of the corresponding packed 32-bit integer in a.
_mm256_movepi32_mask
__mmask8 _mm256_movepi32_mask(__m256i a)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vpmovd2m
Set each bit of the returned mask value based on the most significant bit of the corresponding packed 32-bit integer in a.
_mm512_movepi32_mask
__mmask16 _mm512_movepi32_mask(__m512i a)
CPUID Flags: AVX512DQ
Instruction(s): vpmovd2m
Set each bit of the returned mask value based on the most significant bit of the corresponding packed 32-bit integer in a.
_mm_movepi64_mask
__mmask8 _mm_movepi64_mask(__m128i a)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vpmovq2m
Set each bit of the returned mask value based on the most significant bit of the corresponding packed 64-bit integer in a.
_mm256_movepi64_mask
__mmask8 _mm256_movepi64_mask(__m256i a)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vpmovq2m
Set each bit of the returned mask value based on the most significant bit of the corresponding packed 64-bit integer in a.
_mm512_movepi64_mask
__mmask8 _mm512_movepi64_mask(__m512i a)
CPUID Flags: AVX512DQ
Instruction(s): vpmovq2m
Set each bit of the returned mask value based on the most significant bit of the corresponding packed 64-bit integer in a.
_mm_movepi16_mask
__mmask8 _mm_movepi16_mask(__m128i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmovw2m
Set each bit of the returned mask value based on the most significant bit of the corresponding packed 16-bit integer in a.
_mm256_movepi16_mask
__mmask16 _mm256_movepi16_mask(__m256i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmovw2m
Set each bit of the returned mask value based on the most significant bit of the corresponding packed 16-bit integer in a.
_mm512_movepi16_mask
__mmask32 _mm512_movepi16_mask(__m512i a)
CPUID Flags: AVX512BW
Instruction(s): vpmovw2m
Set each bit of the returned mask value based on the most significant bit of the corresponding packed 16-bit integer in a.
_mm_permutexvar_epi8
__m128i _mm_permutexvar_epi8(__m128i idx, __m128i a)
CPUID Flags: AVX512VBMI, AVX512VL
Instruction(s): vpermb
Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and return the result.
_mm_mask_permutexvar_epi8
__m128i _mm_mask_permutexvar_epi8(__m128i src, __mmask16 k, __m128i idx, __m128i a)
CPUID Flags: AVX512VBMI, AVX512VL
Instruction(s): vpermb
Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and return the result using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_permutexvar_epi8
__m128i _mm_maskz_permutexvar_epi8(__mmask16 k, __m128i idx, __m128i a)
CPUID Flags: AVX512VBMI, AVX512VL
Instruction(s): vpermb
Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and return the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_permutexvar_epi8
__m256i _mm256_permutexvar_epi8(__m256i idx, __m256i a)
CPUID Flags: AVX512VBMI, AVX512VL
Instruction(s): vpermb
Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and return the result.
_mm256_mask_permutexvar_epi8
__m256i _mm256_mask_permutexvar_epi8(__m256i src, __mmask32 k, __m256i idx, __m256i a)
CPUID Flags: AVX512VBMI, AVX512VL
Instruction(s): vpermb
Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and return the result using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_permutexvar_epi8
__m256i _mm256_maskz_permutexvar_epi8(__mmask32 k, __m256i idx, __m256i a)
CPUID Flags: AVX512VBMI, AVX512VL
Instruction(s): vpermb
Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and return the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_permutexvar_epi8
__m512i _mm512_permutexvar_epi8(__m512i idx, __m512i a)
CPUID Flags: AVX512VBMI
Instruction(s): vpermb
Shuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the result.
_mm512_mask_permutexvar_epi8
__m512i _mm512_mask_permutexvar_epi8(__m512i src, __mmask64 k, __m512i idx, __m512i a)
CPUID Flags: AVX512VBMI
Instruction(s): vpermb
Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and return the result using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_permutexvar_epi8
__m512i _mm512_maskz_permutexvar_epi8(__mmask64 k, __m512i idx, __m512i a)
CPUID Flags: AVX512VBMI
Instruction(s): vpermb
Shuffle 8-bit integers in a across lanes using the corresponding index in idx, and return the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_permutex2var_epi8
__m128i _mm_permutex2var_epi8(__m128i a, __m128i idx, __m128i b)
CPUID Flags: AVX512VBMI, AVX512VL
Instruction(s): vpermi2b
Shuffle 8-bit integers in a and b using the corresponding index in idx, and return the result.
_mm_mask_permutex2var_epi8
__m128i _mm_mask_permutex2var_epi8(__m128i a, __mmask16 k, __m128i idx, __m128i b)
CPUID Flags: AVX512VBMI, AVX512VL
Instruction(s): vpermt2b
Shuffle 8-bit integers in a and b using the corresponding index in idx, and return the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm_mask2_permutex2var_epi8
__m128i _mm_mask2_permutex2var_epi8(__m128i a, __m128i idx, __mmask16 k, __m128i b)
CPUID Flags: AVX512VBMI, AVX512VL
Instruction(s): vpermi2b
Shuffle 8-bit integers in a and b using the corresponding index in idx, and return the result using writemask k (elements are copied from idx when the corresponding mask bit is not set).
_mm_maskz_permutex2var_epi8
__m128i _mm_maskz_permutex2var_epi8(__mmask16 k, __m128i a, __m128i idx, __m128i b)
CPUID Flags: AVX512VBMI, AVX512VL
Instruction(s): vpermi2b, vpermt2b
Shuffle 8-bit integers in a and b using the corresponding index in idx, and return the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_permutex2var_epi8
__m256i _mm256_permutex2var_epi8(__m256i a, __m256i idx, __m256i b)
CPUID Flags: AVX512VBMI, AVX512VL
Instruction(s): vpermi2b
Shuffle 8-bit integers in a and b across lanes using the corresponding index in idx, and return the result.
_mm256_mask_permutex2var_epi8
__m256i _mm256_mask_permutex2var_epi8(__m256i a, __mmask32 k, __m256i idx, __m256i b)
CPUID Flags: AVX512VBMI, AVX512VL
Instruction(s): vpermt2b
Shuffle 8-bit integers in a and b across lanes using the corresponding index in idx, and return the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm256_mask2_permutex2var_epi8
__m256i _mm256_mask2_permutex2var_epi8(__m256i a, __m256i idx, __mmask32 k, __m256i b)
CPUID Flags: AVX512VBMI, AVX512VL
Instruction(s): vpermi2b
Shuffle 8-bit integers in a and b across lanes using the corresponding index in idx, and return the result using writemask k (elements are copied from idx when the corresponding mask bit is not set).
_mm256_maskz_permutex2var_epi8
__m256i _mm256_maskz_permutex2var_epi8(__mmask32 k, __m256i a, __m256i idx, __m256i b)
CPUID Flags: AVX512VBMI, AVX512VL
Instruction(s): vpermi2b, vpermt2b
Shuffle 8-bit integers in a and b across lanes using the corresponding index in idx, and return the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_permutex2var_epi8
__m512i _mm512_permutex2var_epi8(__m512i a, __m512i idx, __m512i b)
CPUID Flags: AVX512VBMI
Instruction(s): vpermi2b
Shuffle 8-bit integers in a and b across lanes using the corresponding index in idx, and return the result.
_mm512_mask_permutex2var_epi8
__m512i _mm512_mask_permutex2var_epi8(__m512i a, __mmask64 k, __m512i idx, __m512i b)
CPUID Flags: AVX512VBMI
Instruction(s): vpermt2b
Shuffle 8-bit integers in a and b across lanes using the corresponding index in idx, and return the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask2_permutex2var_epi8
__m512i _mm512_mask2_permutex2var_epi8(__m512i a, __m512i idx, __mmask64 k, __m512i b)
CPUID Flags: AVX512VBMI
Instruction(s): vpermi2b
Shuffle 8-bit integers in a and b across lanes using the corresponding index in idx, and return the result using writemask k (elements are copied from idx when the corresponding mask bit is not set).
_mm512_maskz_permutex2var_epi8
__m512i _mm512_maskz_permutex2var_epi8(__mmask64 k, __m512i a, __m512i idx, __m512i b)
CPUID Flags: AVX512VBMI
Instruction(s): vpermi2b, vpermt2b
Shuffle 8-bit integers in a and b across lanes using the corresponding index in idx, and return the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).