Visible to Intel only — GUID: GUID-9443384F-F925-4593-A1F7-76F6899A1C88
Visible to Intel only — GUID: GUID-9443384F-F925-4593-A1F7-76F6899A1C88
Intrinsics for Store Operations
The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file.
To use these intrinsics, include the immintrin.h file as follows:
#include <immintrin.h>
variable | definition |
---|---|
base_addr | pointer to base address in memory to begin load or store operation |
mem_addr | pointer to base address in memory |
k | writemask used as a selector |
a | first source vector element |
_mm_mask_compressstoreu_pd
void _mm_mask_compressstoreu_pd(void* base_addr, __mmask8 k, __m128d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vcompresspd
Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
_mm256_mask_compressstoreu_pd
void _mm256_mask_compressstoreu_pd(void* base_addr, __mmask8 k, __m256d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vcompresspd
Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
_mm_mask_compressstoreu_ps
void _mm_mask_compressstoreu_ps(void* base_addr, __mmask8 k, __m128 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vcompressps
Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
_mm256_mask_compressstoreu_ps
void _mm256_mask_compressstoreu_ps(void* base_addr, __mmask8 k, __m256 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vcompressps
Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
_mm_mask_store_pd
void _mm_mask_store_pd(void* mem_addr, __mmask8 k, __m128d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovapd
Store packed double-precision (64-bit) floating-point elements from a into memory using writemask k. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
_mm256_mask_store_pd
void _mm256_mask_store_pd(void* mem_addr, __mmask8 k, __m256d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovapd
Store packed double-precision (64-bit) floating-point elements from a into memory using writemask k. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm_mask_store_ps
void _mm_mask_store_ps(void* mem_addr, __mmask8 k, __m128 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovaps
Store packed single-precision (32-bit) floating-point elements from a into memory using writemask k. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
_mm256_mask_store_ps
void _mm256_mask_store_ps(void* mem_addr, __mmask8 k, __m256 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovaps
Store packed single-precision (32-bit) floating-point elements from a into memory using writemask k. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm_mask_storeu_pd
void _mm_mask_storeu_pd(void* mem_addr, __mmask8 k, __m128d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovupd
Store packed double-precision (64-bit) floating-point elements from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
_mm256_mask_storeu_pd
void _mm256_mask_storeu_pd(void* mem_addr, __mmask8 k, __m256d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovupd
Store packed double-precision (64-bit) floating-point elements from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
_mm_mask_storeu_ps
void _mm_mask_storeu_ps(void* mem_addr, __mmask8 k, __m128 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovups
Store packed single-precision (32-bit) floating-point elements from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
_mm256_mask_storeu_ps
void _mm256_mask_storeu_ps(void* mem_addr, __mmask8 k, __m256 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovups
Store packed single-precision (32-bit) floating-point elements from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
_mm_i32scatter_pd
void _mm_i32scatter_pd(void* base_addr, __m128i vindex, __m128d a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscatterdpd
Scatter double-precision (64-bit) floating-point elements from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
_mm_mask_i32scatter_pd
void _mm_mask_i32scatter_pd(void* base_addr, __mmask8 k, __m128i vindex, __m128d a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscatterdpd
Scatter double-precision (64-bit) floating-point elements from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm256_i32scatter_pd
void _mm256_i32scatter_pd(void* base_addr, __m128i vindex, __m256d a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscatterdpd
Scatter double-precision (64-bit) floating-point elements from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
_mm256_mask_i32scatter_pd
void _mm256_mask_i32scatter_pd(void* base_addr, __mmask8 k, __m128i vindex, __m256d a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscatterdpd
Scatter double-precision (64-bit) floating-point elements from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm_i32scatter_ps
void _mm_i32scatter_ps(void* base_addr, __m128i vindex, __m128 a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscatterdps
Scatter single-precision (32-bit) floating-point elements from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
_mm_mask_i32scatter_ps
void _mm_mask_i32scatter_ps(void* base_addr, __mmask8 k, __m128i vindex, __m128 a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscatterdps
Scatter single-precision (32-bit) floating-point elements from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm256_i32scatter_ps
void _mm256_i32scatter_ps(void* base_addr, __m256i vindex, __m256 a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscatterdps
Scatter single-precision (32-bit) floating-point elements from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
_mm256_mask_i32scatter_ps
void _mm256_mask_i32scatter_ps(void* base_addr, __mmask8 k, __m256i vindex, __m256 a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscatterdps
Scatter single-precision (32-bit) floating-point elements from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm_i64scatter_pd
void _mm_i64scatter_pd(void* base_addr, __m128i vindex, __m128d a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscatterqpd
Scatter double-precision (64-bit) floating-point elements from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
_mm_mask_i64scatter_pd
void _mm_mask_i64scatter_pd(void* base_addr, __mmask8 k, __m128i vindex, __m128d a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscatterqpd
Scatter double-precision (64-bit) floating-point elements from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm256_i64scatter_pd
void _mm256_i64scatter_pd(void* base_addr, __m256i vindex, __m256d a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscatterqpd
Scatter double-precision (64-bit) floating-point elements from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
_mm256_mask_i64scatter_pd
void _mm256_mask_i64scatter_pd(void* base_addr, __mmask8 k, __m256i vindex, __m256d a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscatterqpd
Scatter double-precision (64-bit) floating-point elements from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm_i64scatter_ps
void _mm_i64scatter_ps(void* base_addr, __m128i vindex, __m128 a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscatterqps
Scatter single-precision (32-bit) floating-point elements from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm_mask_i64scatter_ps
void _mm_mask_i64scatter_ps(void* base_addr, __mmask8 k, __m128i vindex, __m128 a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscatterqps
Scatter single-precision (32-bit) floating-point elements from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm256_i64scatter_ps
void _mm256_i64scatter_ps(void* base_addr, __m256i vindex, __m128 a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscatterqps
Scatter single-precision (32-bit) floating-point elements from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm256_mask_i64scatter_ps
void _mm256_mask_i64scatter_ps(void* base_addr, __mmask8 k, __m256i vindex, __m128 a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vscatterqps
Scatter single-precision (32-bit) floating-point elements from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm_mask_store_epi32
void _mm_mask_store_epi32(void* mem_addr, __mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovdqa32
Store packed 32-bit integers from a into memory using writemask k. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
_mm256_mask_store_epi32
void _mm256_mask_store_epi32(void* mem_addr, __mmask8 k, __m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovdqa32
Store packed 32-bit integers from a into memory using writemask k. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm_mask_store_epi64
void _mm_mask_store_epi64(void* mem_addr, __mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovdqa64
Store packed 64-bit integers from a into memory using writemask k. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
_mm256_mask_store_epi64
void _mm256_mask_store_epi64(void* mem_addr, __mmask8 k, __m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovdqa64
Store packed 64-bit integers from a into memory using writemask k. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm_mask_storeu_epi16
void _mm_mask_storeu_epi16(void* mem_addr, __mmask8 k, __m128i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vmovdqu16
Store packed 16-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
_mm256_mask_storeu_epi16
void _mm256_mask_storeu_epi16(void* mem_addr, __mmask16 k, __m256i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vmovdqu16
Store packed 16-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
_mm512_mask_storeu_epi16
void _mm512_mask_storeu_epi16(void* mem_addr, __mmask32 k, __m512i a)
CPUID Flags: AVX512BW
Instruction(s): vmovdqu16
Store packed 16-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
_mm_mask_storeu_epi32
void _mm_mask_storeu_epi32(void* mem_addr, __mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovdqu32
Store packed 32-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
_mm256_mask_storeu_epi32
void _mm256_mask_storeu_epi32(void* mem_addr, __mmask8 k, __m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovdqu32
Store packed 32-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
_mm_mask_storeu_epi64
void _mm_mask_storeu_epi64(void* mem_addr, __mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovdqu64
Store packed 64-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
_mm256_mask_storeu_epi64
void _mm256_mask_storeu_epi64(void* mem_addr, __mmask8 k, __m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmovdqu64
Store packed 64-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
_mm_mask_storeu_epi8
void _mm_mask_storeu_epi8(void* mem_addr, __mmask16 k, __m128i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vmovdqu8
Store packed 8-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
_mm256_mask_storeu_epi8
void _mm256_mask_storeu_epi8(void* mem_addr, __mmask32 k, __m256i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vmovdqu8
Store packed 8-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
_mm512_mask_storeu_epi8
void _mm512_mask_storeu_epi8(void* mem_addr, __mmask64 k, __m512i a)
CPUID Flags: AVX512BW
Instruction(s): vmovdqu8
Store packed 8-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
_mm_mask_compressstoreu_epi32
void _mm_mask_compressstoreu_epi32(void* base_addr, __mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpcompressd
Contiguously store the active 32-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
_mm256_mask_compressstoreu_epi32
void _mm256_mask_compressstoreu_epi32(void* base_addr, __mmask8 k, __m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpcompressd
Contiguously store the active 32-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
_mm_mask_compressstoreu_epi64
void _mm_mask_compressstoreu_epi64(void* base_addr, __mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpcompressq
Contiguously store the active 64-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
_mm256_mask_compressstoreu_epi64
void _mm256_mask_compressstoreu_epi64(void* base_addr, __mmask8 k, __m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpcompressq
Contiguously store the active 64-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
_mm_i32scatter_epi32
void _mm_i32scatter_epi32(void* base_addr, __m128i vindex, __m128i a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpscatterdd
Scatter 32-bit integers from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
_mm_mask_i32scatter_epi32
void _mm_mask_i32scatter_epi32(void* base_addr, __mmask8 k, __m128i vindex, __m128i a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpscatterdd
Scatter 32-bit integers from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm256_i32scatter_epi32
void _mm256_i32scatter_epi32(void* base_addr, __m256i vindex, __m256i a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpscatterdd
Scatter 32-bit integers from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
_mm256_mask_i32scatter_epi32
void _mm256_mask_i32scatter_epi32(void* base_addr, __mmask8 k, __m256i vindex, __m256i a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpscatterdd
Scatter 32-bit integers from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm_i32scatter_epi64
void _mm_i32scatter_epi64(void* base_addr, __m128i vindex, __m128i a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpscatterdq
Scatter 64-bit integers from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
_mm_mask_i32scatter_epi64
void _mm_mask_i32scatter_epi64(void* base_addr, __mmask8 k, __m128i vindex, __m128i a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpscatterdq
Scatter 64-bit integers from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm256_i32scatter_epi64
void _mm256_i32scatter_epi64(void* base_addr, __m128i vindex, __m256i a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpscatterdq
Scatter 64-bit integers from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
_mm256_mask_i32scatter_epi64
void _mm256_mask_i32scatter_epi64(void* base_addr, __mmask8 k, __m128i vindex, __m256i a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpscatterdq
Scatter 64-bit integers from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm_i64scatter_epi32
void _mm_i64scatter_epi32(void* base_addr, __m128i vindex, __m128i a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpscatterqd
Scatter 32-bit integers from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
_mm_mask_i64scatter_epi32
void _mm_mask_i64scatter_epi32(void* base_addr, __mmask8 k, __m128i vindex, __m128i a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpscatterqd
Scatter 32-bit integers from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm256_i64scatter_epi32
void _mm256_i64scatter_epi32(void* base_addr, __m256i vindex, __m128i a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpscatterqd
Scatter 32-bit integers from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
_mm256_mask_i64scatter_epi32
void _mm256_mask_i64scatter_epi32(void* base_addr, __mmask8 k, __m256i vindex, __m128i a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpscatterqd
Scatter 32-bit integers from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm_i64scatter_epi64
void _mm_i64scatter_epi64(void* base_addr, __m128i vindex, __m128i a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpscatterqq
Scatter 64-bit integers from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
_mm_mask_i64scatter_epi64
void _mm_mask_i64scatter_epi64(void* base_addr, __mmask8 k, __m128i vindex, __m128i a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpscatterqq
Scatter 64-bit integers from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
_mm256_i64scatter_epi64
void _mm256_i64scatter_epi64(void* base_addr, __m256i vindex, __m256i a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpscatterqq
Scatter 64-bit integers from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
_mm256_mask_i64scatter_epi64
void _mm256_mask_i64scatter_epi64(void* base_addr, __mmask8 k, __m256i vindex, __m256i a, const int scale)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpscatterqq
Scatter 64-bit integers from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.