Visible to Intel only — GUID: GUID-84B82B5A-A5D4-4B02-87BC-F1F575758AF3
Visible to Intel only — GUID: GUID-84B82B5A-A5D4-4B02-87BC-F1F575758AF3
Intrinsics for Load Operations
The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file.
To use these intrinsics, include the immintrin.h file as follows:
#include <immintrin.h>
variable | definition |
---|---|
src | source element to use based on writemask result |
k | writemask used as a selector |
mem_addr | pointer to base address in memory |
base_addr | pointer to base address in memory to begin load or store operation |
_mm_mask_expandloadu_pd
__m128d _mm_mask_expandloadu_pd(__m128d src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vexpandpd
Load as many contiguous double-precision (64-bit) floating-point elements from unaligned memory at mem_addr as there are ones in the low 2 bits of mask k, and place them in the result element positions corresponding to the positions of the ones in the mask (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_expandloadu_pd
__m128d _mm_maskz_expandloadu_pd(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vexpandpd
Load as many contiguous double-precision (64-bit) floating-point elements from unaligned memory at mem_addr as there are ones in the low 2 bits of mask k, and place them in the result element positions corresponding to the positions of the ones in the mask (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_expandloadu_pd
__m256d _mm256_mask_expandloadu_pd(__m256d src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vexpandpd
Load as many contiguous double-precision (64-bit) floating-point elements from unaligned memory at mem_addr as there are ones in the low 4 bits of mask k, and place them in the result element positions corresponding to the positions of the ones in the mask (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_expandloadu_pd
__m256d _mm256_maskz_expandloadu_pd(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vexpandpd
Load as many contiguous double-precision (64-bit) floating-point elements from unaligned memory at mem_addr as there are ones in the low 4 bits of mask k, and place them in the result element positions corresponding to the positions of the ones in the mask (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_expandloadu_ps
__m128 _mm_mask_expandloadu_ps(__m128 src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vexpandps
Load as many contiguous single-precision (32-bit) floating-point elements from unaligned memory at mem_addr as there are ones in the low 4 bits of mask k, and place them in the result element positions corresponding to the positions of the ones in the mask (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_expandloadu_ps
__m128 _mm_maskz_expandloadu_ps(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vexpandps
Load as many contiguous single-precision (32-bit) floating-point elements from unaligned memory at mem_addr as there are ones in the low 4 bits of mask k, and place them in the result element positions corresponding to the positions of the ones in the mask (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_expandloadu_ps
__m256 _mm256_mask_expandloadu_ps(__m256 src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vexpandps
Load as many contiguous single-precision (32-bit) floating-point elements from unaligned memory at mem_addr as there are ones in the low 8 bits of mask k, and place them in the result element positions corresponding to the positions of the ones in the mask (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_expandloadu_ps
__m256 _mm256_maskz_expandloadu_ps(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vexpandps
Load as many contiguous single-precision (32-bit) floating-point elements from unaligned memory at mem_addr as there are ones in the low 8 bits of mask k, and place them in the result element positions corresponding to the positions of the ones in the mask (elements are zeroed out when the corresponding mask bit is not set).
_mm_mmask_i32gather_pd
__m128d _mm_mmask_i32gather_pd(__m128d src, __mmask8 k, __m128i vindex, void const* base_addr, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgatherdpd
Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm256_mmask_i32gather_pd
__m256d _mm256_mmask_i32gather_pd(__m256d src, __mmask8 k, __m128i vindex, void const* base_addr, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgatherdpd
Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm_mmask_i32gather_ps
__m128 _mm_mmask_i32gather_ps(__m128 src, __mmask8 k, __m128i vindex, void const* base_addr, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgatherdps
Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm256_mmask_i32gather_ps
__m256 _mm256_mmask_i32gather_ps(__m256 src, __mmask8 k, __m256i vindex, void const* base_addr, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgatherdps
Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm_mmask_i64gather_pd
__m128d _mm_mmask_i64gather_pd(__m128d src, __mmask8 k, __m128i vindex, void const* base_addr, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgatherqpd
Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm256_mmask_i64gather_pd
__m256d _mm256_mmask_i64gather_pd(__m256d src, __mmask8 k, __m256i vindex, void const* base_addr, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgatherqpd
Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm_mmask_i64gather_ps
__m128 _mm_mmask_i64gather_ps(__m128 src, __mmask8 k, __m128i vindex, void const* base_addr, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgatherqps
Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm256_mmask_i64gather_ps
__m128 _mm256_mmask_i64gather_ps(__m128 src, __mmask8 k, __m256i vindex, void const* base_addr, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vgatherqps
Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm_mask_load_pd
__m128d _mm_mask_load_pd(__m128d src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovapd
Load packed double-precision (64-bit) floating-point elements from memory into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
_mm_maskz_load_pd
__m128d _mm_maskz_load_pd(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovapd
Load packed double-precision (64-bit) floating-point elements from memory into the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
_mm256_mask_load_pd
__m256d _mm256_mask_load_pd(__m256d src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovapd
Load packed double-precision (64-bit) floating-point elements from memory into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm256_maskz_load_pd
__m256d _mm256_maskz_load_pd(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovapd
Load packed double-precision (64-bit) floating-point elements from memory into the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm_mask_load_ps
__m128 _mm_mask_load_ps(__m128 src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovaps
Load packed single-precision (32-bit) floating-point elements from memory into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
_mm_maskz_load_ps
__m128 _mm_maskz_load_ps(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovaps
Load packed single-precision (32-bit) floating-point elements from memory into the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
_mm256_mask_load_ps
__m256 _mm256_mask_load_ps(__m256 src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovaps
Load packed single-precision (32-bit) floating-point elements from memory into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm256_maskz_load_ps
__m256 _mm256_maskz_load_ps(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovaps
Load packed single-precision (32-bit) floating-point elements from memory into the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm_mask_loadu_pd
__m128d _mm_mask_loadu_pd(__m128d src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovupd
Load packed double-precision (64-bit) floating-point elements from memoy into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm_maskz_loadu_pd
__m128d _mm_maskz_loadu_pd(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovupd
Load packed double-precision (64-bit) floating-point elements from memoy into the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm256_mask_loadu_pd
__m256d _mm256_mask_loadu_pd(__m256d src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovupd
Load packed double-precision (64-bit) floating-point elements from memoy into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm256_maskz_loadu_pd
__m256d _mm256_maskz_loadu_pd(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovupd
Load packed double-precision (64-bit) floating-point elements from memoy into the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm_mask_loadu_ps
__m128 _mm_mask_loadu_ps(__m128 src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovups
Load packed single-precision (32-bit) floating-point elements from memory into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm_maskz_loadu_ps
__m128 _mm_maskz_loadu_ps(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovups
Load packed single-precision (32-bit) floating-point elements from memory into the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm256_mask_loadu_ps
__m256 _mm256_mask_loadu_ps(__m256 src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovups
Load packed single-precision (32-bit) floating-point elements from memory into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm256_maskz_loadu_ps
__m256 _mm256_maskz_loadu_ps(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovups
Load packed single-precision (32-bit) floating-point elements from memory into the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm_mask_load_epi32
__m128i _mm_mask_load_epi32(__m128i src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovdqa32
Load packed 32-bit integers from memory into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
_mm_maskz_load_epi32
__m128i _mm_maskz_load_epi32(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovdqa32
Load packed 32-bit integers from memory into the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
_mm256_mask_load_epi32
__m256i _mm256_mask_load_epi32(__m256i src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovdqa32
Load packed 32-bit integers from memory into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm256_maskz_load_epi32
__m256i _mm256_maskz_load_epi32(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovdqa32
Load packed 32-bit integers from memory into the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm_mask_load_epi64
__m128i _mm_mask_load_epi64(__m128i src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovdqa64
Load packed 64-bit integers from memory into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
_mm_maskz_load_epi64
__m128i _mm_maskz_load_epi64(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovdqa64
Load packed 64-bit integers from memory into the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
_mm256_mask_load_epi64
__m256i _mm256_mask_load_epi64(__m256i src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovdqa64
Load packed 64-bit integers from memory into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm256_maskz_load_epi64
__m256i _mm256_maskz_load_epi64(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovdqa64
Load packed 64-bit integers from memory into the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm_mask_loadu_epi16
__m128i _mm_mask_loadu_epi16(__m128i src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vmovdqu16
Load packed 16-bit integers from memory into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm_maskz_loadu_epi16
__m128i _mm_maskz_loadu_epi16(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vmovdqu16
Load packed 16-bit integers from memory into the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm256_mask_loadu_epi16
__m256i _mm256_mask_loadu_epi16(__m256i src, __mmask16 k, void const* mem_addr)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vmovdqu16
Load packed 16-bit integers from memory into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm256_maskz_loadu_epi16
__m256i _mm256_maskz_loadu_epi16(__mmask16 k, void const* mem_addr)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vmovdqu16
Load packed 16-bit integers from memory into the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm512_mask_loadu_epi16
__m512i _mm512_mask_loadu_epi16(__m512i src, __mmask32 k, void const* mem_addr)
CPUID Flags: AVX512BW
Instruction(s): vmovdqu16
Load packed 16-bit integers from memory into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm512_maskz_loadu_epi16
__m512i _mm512_maskz_loadu_epi16(__mmask32 k, void const* mem_addr)
CPUID Flags: AVX512BW
Instruction(s): vmovdqu16
Load packed 16-bit integers from memory into the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm_mask_loadu_epi32
__m128i _mm_mask_loadu_epi32(__m128i src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovdqu32
Load packed 32-bit integers from memory into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm_maskz_loadu_epi32
__m128i _mm_maskz_loadu_epi32(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovdqu32
Load packed 32-bit integers from memory into the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm256_mask_loadu_epi32
__m256i _mm256_mask_loadu_epi32(__m256i src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovdqu32
Load packed 32-bit integers from memory into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm256_maskz_loadu_epi32
__m256i _mm256_maskz_loadu_epi32(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovdqu32
Load packed 32-bit integers from memory into the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm_mask_loadu_epi64
__m128i _mm_mask_loadu_epi64(__m128i src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovdqu64
Load packed 64-bit integers from memory into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm_maskz_loadu_epi64
__m128i _mm_maskz_loadu_epi64(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovdqu64
Load packed 64-bit integers from memory into the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm256_mask_loadu_epi64
__m256i _mm256_mask_loadu_epi64(__m256i src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovdqu64
Load packed 64-bit integers from memory into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm256_maskz_loadu_epi64
__m256i _mm256_maskz_loadu_epi64(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovdqu64
Load packed 64-bit integers from memory into the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm_mask_loadu_epi8
__m128i _mm_mask_loadu_epi8(__m128i src, __mmask16 k, void const* mem_addr)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vmovdqu8
Load packed 8-bit integers from memory into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm_maskz_loadu_epi8
__m128i _mm_maskz_loadu_epi8(__mmask16 k, void const* mem_addr)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vmovdqu8
Load packed 8-bit integers from memory into the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm256_mask_loadu_epi8
__m256i _mm256_mask_loadu_epi8(__m256i src, __mmask32 k, void const* mem_addr)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vmovdqu8
Load packed 8-bit integers from memory into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm256_maskz_loadu_epi8
__m256i _mm256_maskz_loadu_epi8(__mmask32 k, void const* mem_addr)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vmovdqu8
Load packed 8-bit integers from memory into the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm512_mask_loadu_epi8
__m512i _mm512_mask_loadu_epi8(__m512i src, __mmask64 k, void const* mem_addr)
CPUID Flags: AVX512BW
Instruction(s): vmovdqu8
Load packed 8-bit integers from memory into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm512_maskz_loadu_epi8
__m512i _mm512_maskz_loadu_epi8(__mmask64 k, void const* mem_addr)
CPUID Flags: AVX512BW
Instruction(s): vmovdqu8
Load packed 8-bit integers from memory into the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
_mm_mask_expandloadu_epi32
__m128i _mm_mask_expandloadu_epi32(__m128i src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpexpandd
Load as many contiguous 32-bit integers from unaligned memory at mem_addr as there are ones in the low 4 bits of mask k, and place them in the result element positions corresponding to the positions of the ones in the mask (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_expandloadu_epi32
__m128i _mm_maskz_expandloadu_epi32(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpexpandd
Load as many contiguous 32-bit integers from unaligned memory at mem_addr as there are ones in the low 4 bits of mask k, and place them in the result element positions corresponding to the positions of the ones in the mask (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_expandloadu_epi32
__m256i _mm256_mask_expandloadu_epi32(__m256i src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpexpandd
Load as many contiguous 32-bit integers from unaligned memory at mem_addr as there are ones in the low 8 bits of mask k, and place them in the result element positions corresponding to the positions of the ones in the mask (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_expandloadu_epi32
__m256i _mm256_maskz_expandloadu_epi32(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpexpandd
Load as many contiguous 32-bit integers from unaligned memory at mem_addr as there are ones in the low 8 bits of mask k, and place them in the result element positions corresponding to the positions of the ones in the mask (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_expandloadu_epi64
__m128i _mm_mask_expandloadu_epi64(__m128i src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpexpandq
Load as many contiguous 64-bit integers from unaligned memory at mem_addr as there are ones in the low 2 bits of mask k, and place them in the result element positions corresponding to the positions of the ones in the mask (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_expandloadu_epi64
__m128i _mm_maskz_expandloadu_epi64(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpexpandq
Load as many contiguous 64-bit integers from unaligned memory at mem_addr as there are ones in the low 2 bits of mask k, and place them in the result element positions corresponding to the positions of the ones in the mask (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_expandloadu_epi64
__m256i _mm256_mask_expandloadu_epi64(__m256i src, __mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpexpandq
Load as many contiguous 64-bit integers from unaligned memory at mem_addr as there are ones in the low 4 bits of mask k, and place them in the result element positions corresponding to the positions of the ones in the mask (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_expandloadu_epi64
__m256i _mm256_maskz_expandloadu_epi64(__mmask8 k, void const* mem_addr)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpexpandq
Load as many contiguous 64-bit integers from unaligned memory at mem_addr as there are ones in the low 4 bits of mask k, and place them in the result element positions corresponding to the positions of the ones in the mask (elements are zeroed out when the corresponding mask bit is not set).
_mm_mmask_i32gather_epi32
__m128i _mm_mmask_i32gather_epi32(__m128i src, __mmask8 k, __m128i vindex, void const* base_addr, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpgatherdd
Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm256_mmask_i32gather_epi32
__m256i _mm256_mmask_i32gather_epi32(__m256i src, __mmask8 k, __m256i vindex, void const* base_addr, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpgatherdd
Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm_mmask_i32gather_epi64
__m128i _mm_mmask_i32gather_epi64(__m128i src, __mmask8 k, __m128i vindex, void const* base_addr, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpgatherdq
Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm256_mmask_i32gather_epi64
__m256i _mm256_mmask_i32gather_epi64(__m256i src, __mmask8 k, __m128i vindex, void const* base_addr, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpgatherdq
Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm_mmask_i64gather_epi32
__m128i _mm_mmask_i64gather_epi32(__m128i src, __mmask8 k, __m128i vindex, void const* base_addr, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpgatherqd
Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm256_mmask_i64gather_epi32
__m128i _mm256_mmask_i64gather_epi32(__m128i src, __mmask8 k, __m256i vindex, void const* base_addr, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpgatherqd
Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm_mmask_i64gather_epi64
__m128i _mm_mmask_i64gather_epi64(__m128i src, __mmask8 k, __m128i vindex, void const* base_addr, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpgatherqq
Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm256_mmask_i64gather_epi64
__m256i _mm256_mmask_i64gather_epi64(__m256i src, __mmask8 k, __m256i vindex, void const* base_addr, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpgatherqq
Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into the return value using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.