Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 3/31/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Intrinsics for Arithmetic Operations

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>


variable definition
src

source element to use based on writemask result

k

writemask used as a selector

a

first source vector element

b

second source vector element

c

third source vector element


_mm_mask_add_pd

__m128d _mm_mask_add_pd(__m128d src, __mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vaddpd

Add packed double-precision (64-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_add_pd

__m128d _mm_maskz_add_pd(__mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vaddpd

Add packed double-precision (64-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_add_pd

__m256d _mm256_mask_add_pd(__m256d src, __mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vaddpd

Add packed double-precision (64-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_add_pd

__m256d _mm256_maskz_add_pd(__mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vaddpd

Add packed double-precision (64-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_add_ps

__m128 _mm_mask_add_ps(__m128 src, __mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vaddps

Add packed single-precision (32-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_add_ps

__m128 _mm_maskz_add_ps(__mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vaddps

Add packed single-precision (32-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_add_ps

__m256 _mm256_mask_add_ps(__m256 src, __mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vaddps

Add packed single-precision (32-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_add_ps

__m256 _mm256_maskz_add_ps(__mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vaddps

Add packed single-precision (32-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_div_pd

__m128d _mm_mask_div_pd(__m128d src, __mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vdivpd

Divide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_div_pd

__m128d _mm_maskz_div_pd(__mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vdivpd

Divide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_div_pd

__m256d _mm256_mask_div_pd(__m256d src, __mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vdivpd

Divide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_div_pd

__m256d _mm256_maskz_div_pd(__mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vdivpd

Divide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_div_ps

__m128 _mm_mask_div_ps(__m128 src, __mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vdivps

Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_div_ps

__m128 _mm_maskz_div_ps(__mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vdivps

Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_div_ps

__m256 _mm256_mask_div_ps(__m256 src, __mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vdivps

Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_div_ps

__m256 _mm256_maskz_div_ps(__mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vdivps

Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_fmadd_pd

__m128d _mm_mask_fmadd_pd(__m128d a, __mmask8 k, __m128d b, __m128d c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmadd132pd, vfmadd213pd, vfmadd231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_mask3_fmadd_pd

__m128d _mm_mask3_fmadd_pd(__m128d a, __m128d b, __m128d c, __mmask8 k)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmadd132pd, vfmadd213pd, vfmadd231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).



_mm_maskz_fmadd_pd

__m128d _mm_maskz_fmadd_pd(__mmask8 k, __m128d a, __m128d b, __m128d c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmadd132pd, vfmadd213pd, vfmadd231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_fmadd_pd

__m256d _mm256_mask_fmadd_pd(__m256d a, __mmask8 k, __m256d b, __m256d c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmadd132pd, vfmadd213pd, vfmadd231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_mask3_fmadd_pd

__m256d _mm256_mask3_fmadd_pd(__m256d a, __m256d b, __m256d c, __mmask8 k)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmadd132pd, vfmadd213pd, vfmadd231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).



_mm256_maskz_fmadd_pd

__m256d _mm256_maskz_fmadd_pd(__mmask8 k, __m256d a, __m256d b, __m256d c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmadd132pd, vfmadd213pd, vfmadd231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_fmadd_ps

__m128 _mm_mask_fmadd_ps(__m128 a, __mmask8 k, __m128 b, __m128 c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmadd132ps, vfmadd213ps, vfmadd231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_mask3_fmadd_ps

__m128 _mm_mask3_fmadd_ps(__m128 a, __m128 b, __m128 c, __mmask8 k)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmadd132ps, vfmadd213ps, vfmadd231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).



_mm_maskz_fmadd_ps

__m128 _mm_maskz_fmadd_ps(__mmask8 k, __m128 a, __m128 b, __m128 c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmadd132ps, vfmadd213ps, vfmadd231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_fmadd_ps

__m256 _mm256_mask_fmadd_ps(__m256 a, __mmask8 k, __m256 b, __m256 c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmadd132ps, vfmadd213ps, vfmadd231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_mask3_fmadd_ps

__m256 _mm256_mask3_fmadd_ps(__m256 a, __m256 b, __m256 c, __mmask8 k)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmadd132ps, vfmadd213ps, vfmadd231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).



_mm256_maskz_fmadd_ps

__m256 _mm256_maskz_fmadd_ps(__mmask8 k, __m256 a, __m256 b, __m256 c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmadd132ps, vfmadd213ps, vfmadd231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_fmaddsub_pd

__m128d _mm_mask_fmaddsub_pd(__m128d a, __mmask8 k, __m128d b, __m128d c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmaddsub132pd, vfmaddsub213pd, vfmaddsub231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_mask3_fmaddsub_pd

__m128d _mm_mask3_fmaddsub_pd(__m128d a, __m128d b, __m128d c, __mmask8 k)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmaddsub132pd, vfmaddsub213pd, vfmaddsub231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).



_mm_maskz_fmaddsub_pd

__m128d _mm_maskz_fmaddsub_pd(__mmask8 k, __m128d a, __m128d b, __m128d c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmaddsub132pd, vfmaddsub213pd, vfmaddsub231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_fmaddsub_pd

__m256d _mm256_mask_fmaddsub_pd(__m256d a, __mmask8 k, __m256d b, __m256d c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmaddsub132pd, vfmaddsub213pd, vfmaddsub231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_mask3_fmaddsub_pd

__m256d _mm256_mask3_fmaddsub_pd(__m256d a, __m256d b, __m256d c, __mmask8 k)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmaddsub132pd, vfmaddsub213pd, vfmaddsub231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).



_mm256_maskz_fmaddsub_pd

__m256d _mm256_maskz_fmaddsub_pd(__mmask8 k, __m256d a, __m256d b, __m256d c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmaddsub132pd, vfmaddsub213pd, vfmaddsub231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_fmaddsub_ps

__m128 _mm_mask_fmaddsub_ps(__m128 a, __mmask8 k, __m128 b, __m128 c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmaddsub132ps, vfmaddsub213ps, vfmaddsub231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_mask3_fmaddsub_ps

__m128 _mm_mask3_fmaddsub_ps(__m128 a, __m128 b, __m128 c, __mmask8 k)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmaddsub132ps, vfmaddsub213ps, vfmaddsub231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).



_mm_maskz_fmaddsub_ps

__m128 _mm_maskz_fmaddsub_ps(__mmask8 k, __m128 a, __m128 b, __m128 c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmaddsub132ps, vfmaddsub213ps, vfmaddsub231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_fmaddsub_ps

__m256 _mm256_mask_fmaddsub_ps(__m256 a, __mmask8 k, __m256 b, __m256 c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmaddsub132ps, vfmaddsub213ps, vfmaddsub231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_mask3_fmaddsub_ps

__m256 _mm256_mask3_fmaddsub_ps(__m256 a, __m256 b, __m256 c, __mmask8 k)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmaddsub132ps, vfmaddsub213ps, vfmaddsub231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).



_mm256_maskz_fmaddsub_ps

__m256 _mm256_maskz_fmaddsub_ps(__mmask8 k, __m256 a, __m256 b, __m256 c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmaddsub132ps, vfmaddsub213ps, vfmaddsub231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_fmsub_pd

__m128d _mm_mask_fmsub_pd(__m128d a, __mmask8 k, __m128d b, __m128d c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmsub132pd, vfmsub213pd, vfmsub231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_mask3_fmsub_pd

__m128d _mm_mask3_fmsub_pd(__m128d a, __m128d b, __m128d c, __mmask8 k)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmsub132pd, vfmsub213pd, vfmsub231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).



_mm_maskz_fmsub_pd

__m128d _mm_maskz_fmsub_pd(__mmask8 k, __m128d a, __m128d b, __m128d c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmsub132pd, vfmsub213pd, vfmsub231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_fmsub_pd

__m256d _mm256_mask_fmsub_pd(__m256d a, __mmask8 k, __m256d b, __m256d c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmsub132pd, vfmsub213pd, vfmsub231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_mask3_fmsub_pd

__m256d _mm256_mask3_fmsub_pd(__m256d a, __m256d b, __m256d c, __mmask8 k)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmsub132pd, vfmsub213pd, vfmsub231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).



_mm256_maskz_fmsub_pd

__m256d _mm256_maskz_fmsub_pd(__mmask8 k, __m256d a, __m256d b, __m256d c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmsub132pd, vfmsub213pd, vfmsub231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_fmsub_ps

__m128 _mm_mask_fmsub_ps(__m128 a, __mmask8 k, __m128 b, __m128 c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmsub132ps, vfmsub213ps, vfmsub231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_mask3_fmsub_ps

__m128 _mm_mask3_fmsub_ps(__m128 a, __m128 b, __m128 c, __mmask8 k)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmsub132ps, vfmsub213ps, vfmsub231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).



_mm_maskz_fmsub_ps

__m128 _mm_maskz_fmsub_ps(__mmask8 k, __m128 a, __m128 b, __m128 c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmsub132ps, vfmsub213ps, vfmsub231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_fmsub_ps

__m256 _mm256_mask_fmsub_ps(__m256 a, __mmask8 k, __m256 b, __m256 c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmsub132ps, vfmsub213ps, vfmsub231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_mask3_fmsub_ps

__m256 _mm256_mask3_fmsub_ps(__m256 a, __m256 b, __m256 c, __mmask8 k)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmsub132ps, vfmsub213ps, vfmsub231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).



_mm256_maskz_fmsub_ps

__m256 _mm256_maskz_fmsub_ps(__mmask8 k, __m256 a, __m256 b, __m256 c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmsub132ps, vfmsub213ps, vfmsub231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_fmsubadd_pd

__m128d _mm_mask_fmsubadd_pd(__m128d a, __mmask8 k, __m128d b, __m128d c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmsubadd132pd, vfmsubadd213pd, vfmsubadd231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_mask3_fmsubadd_pd

__m128d _mm_mask3_fmsubadd_pd(__m128d a, __m128d b, __m128d c, __mmask8 k)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmsubadd132pd, vfmsubadd213pd, vfmsubadd231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).



_mm_maskz_fmsubadd_pd

__m128d _mm_maskz_fmsubadd_pd(__mmask8 k, __m128d a, __m128d b, __m128d c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmsubadd132pd, vfmsubadd213pd, vfmsubadd231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_fmsubadd_pd

__m256d _mm256_mask_fmsubadd_pd(__m256d a, __mmask8 k, __m256d b, __m256d c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmsubadd132pd, vfmsubadd213pd, vfmsubadd231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_mask3_fmsubadd_pd

__m256d _mm256_mask3_fmsubadd_pd(__m256d a, __m256d b, __m256d c, __mmask8 k)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmsubadd132pd, vfmsubadd213pd, vfmsubadd231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).



_mm256_maskz_fmsubadd_pd

__m256d _mm256_maskz_fmsubadd_pd(__mmask8 k, __m256d a, __m256d b, __m256d c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmsubadd132pd, vfmsubadd213pd, vfmsubadd231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_fmsubadd_ps

__m128 _mm_mask_fmsubadd_ps(__m128 a, __mmask8 k, __m128 b, __m128 c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmsubadd132ps, vfmsubadd213ps, vfmsubadd231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_mask3_fmsubadd_ps

__m128 _mm_mask3_fmsubadd_ps(__m128 a, __m128 b, __m128 c, __mmask8 k)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmsubadd132ps, vfmsubadd213ps, vfmsubadd231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).



_mm_maskz_fmsubadd_ps

__m128 _mm_maskz_fmsubadd_ps(__mmask8 k, __m128 a, __m128 b, __m128 c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmsubadd132ps, vfmsubadd213ps, vfmsubadd231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_fmsubadd_ps

__m256 _mm256_mask_fmsubadd_ps(__m256 a, __mmask8 k, __m256 b, __m256 c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmsubadd132ps, vfmsubadd213ps, vfmsubadd231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_mask3_fmsubadd_ps

__m256 _mm256_mask3_fmsubadd_ps(__m256 a, __m256 b, __m256 c, __mmask8 k)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmsubadd132ps, vfmsubadd213ps, vfmsubadd231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).



_mm256_maskz_fmsubadd_ps

__m256 _mm256_maskz_fmsubadd_ps(__mmask8 k, __m256 a, __m256 b, __m256 c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfmsubadd132ps, vfmsubadd213ps, vfmsubadd231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_fnmadd_pd

__m128d _mm_mask_fnmadd_pd(__m128d a, __mmask8 k, __m128d b, __m128d c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfnmadd132pd, vfnmadd213pd, vfnmadd231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_mask3_fnmadd_pd

__m128d _mm_mask3_fnmadd_pd(__m128d a, __m128d b, __m128d c, __mmask8 k)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfnmadd132pd, vfnmadd213pd, vfnmadd231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).



_mm_maskz_fnmadd_pd

__m128d _mm_maskz_fnmadd_pd(__mmask8 k, __m128d a, __m128d b, __m128d c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfnmadd132pd, vfnmadd213pd, vfnmadd231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_fnmadd_pd

__m256d _mm256_mask_fnmadd_pd(__m256d a, __mmask8 k, __m256d b, __m256d c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfnmadd132pd, vfnmadd213pd, vfnmadd231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_mask3_fnmadd_pd

__m256d _mm256_mask3_fnmadd_pd(__m256d a, __m256d b, __m256d c, __mmask8 k)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfnmadd132pd, vfnmadd213pd, vfnmadd231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).



_mm256_maskz_fnmadd_pd

__m256d _mm256_maskz_fnmadd_pd(__mmask8 k, __m256d a, __m256d b, __m256d c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfnmadd132pd, vfnmadd213pd, vfnmadd231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_fnmadd_ps

__m128 _mm_mask_fnmadd_ps(__m128 a, __mmask8 k, __m128 b, __m128 c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfnmadd132ps, vfnmadd213ps, vfnmadd231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_mask3_fnmadd_ps

__m128 _mm_mask3_fnmadd_ps(__m128 a, __m128 b, __m128 c, __mmask8 k)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfnmadd132ps, vfnmadd213ps, vfnmadd231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).



_mm_maskz_fnmadd_ps

__m128 _mm_maskz_fnmadd_ps(__mmask8 k, __m128 a, __m128 b, __m128 c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfnmadd132ps, vfnmadd213ps, vfnmadd231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_fnmadd_ps

__m256 _mm256_mask_fnmadd_ps(__m256 a, __mmask8 k, __m256 b, __m256 c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfnmadd132ps, vfnmadd213ps, vfnmadd231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_mask3_fnmadd_ps

__m256 _mm256_mask3_fnmadd_ps(__m256 a, __m256 b, __m256 c, __mmask8 k)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfnmadd132ps, vfnmadd213ps, vfnmadd231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).



_mm256_maskz_fnmadd_ps

__m256 _mm256_maskz_fnmadd_ps(__mmask8 k, __m256 a, __m256 b, __m256 c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfnmadd132ps, vfnmadd213ps, vfnmadd231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_fnmsub_pd

__m128d _mm_mask_fnmsub_pd(__m128d a, __mmask8 k, __m128d b, __m128d c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfnmsub132pd, vfnmsub213pd, vfnmsub231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_mask3_fnmsub_pd

__m128d _mm_mask3_fnmsub_pd(__m128d a, __m128d b, __m128d c, __mmask8 k)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfnmsub132pd, vfnmsub213pd, vfnmsub231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).



_mm_maskz_fnmsub_pd

__m128d _mm_maskz_fnmsub_pd(__mmask8 k, __m128d a, __m128d b, __m128d c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfnmsub132pd, vfnmsub213pd, vfnmsub231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_fnmsub_pd

__m256d _mm256_mask_fnmsub_pd(__m256d a, __mmask8 k, __m256d b, __m256d c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfnmsub132pd, vfnmsub213pd, vfnmsub231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_mask3_fnmsub_pd

__m256d _mm256_mask3_fnmsub_pd(__m256d a, __m256d b, __m256d c, __mmask8 k)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfnmsub132pd, vfnmsub213pd, vfnmsub231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).



_mm256_maskz_fnmsub_pd

__m256d _mm256_maskz_fnmsub_pd(__mmask8 k, __m256d a, __m256d b, __m256d c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfnmsub132pd, vfnmsub213pd, vfnmsub231pd

Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_fnmsub_ps

__m128 _mm_mask_fnmsub_ps(__m128 a, __mmask8 k, __m128 b, __m128 c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfnmsub132ps, vfnmsub213ps, vfnmsub231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_mask3_fnmsub_ps

__m128 _mm_mask3_fnmsub_ps(__m128 a, __m128 b, __m128 c, __mmask8 k)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfnmsub132ps, vfnmsub213ps, vfnmsub231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).



_mm_maskz_fnmsub_ps

__m128 _mm_maskz_fnmsub_ps(__mmask8 k, __m128 a, __m128 b, __m128 c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfnmsub132ps, vfnmsub213ps, vfnmsub231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_fnmsub_ps

__m256 _mm256_mask_fnmsub_ps(__m256 a, __mmask8 k, __m256 b, __m256 c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfnmsub132ps, vfnmsub213ps, vfnmsub231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_mask3_fnmsub_ps

__m256 _mm256_mask3_fnmsub_ps(__m256 a, __m256 b, __m256 c, __mmask8 k)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfnmsub132ps, vfnmsub213ps, vfnmsub231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).



_mm256_maskz_fnmsub_ps

__m256 _mm256_maskz_fnmsub_ps(__mmask8 k, __m256 a, __m256 b, __m256 c)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfnmsub132ps, vfnmsub213ps, vfnmsub231ps

Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_max_pd

__m128d _mm_mask_max_pd(__m128d src, __mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmaxpd

Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_max_pd

__m128d _mm_maskz_max_pd(__mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmaxpd

Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_max_pd

__m256d _mm256_mask_max_pd(__m256d src, __mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmaxpd

Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_max_pd

__m256d _mm256_maskz_max_pd(__mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmaxpd

Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_max_ps

__m128 _mm_mask_max_ps(__m128 src, __mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmaxps

Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_max_ps

__m128 _mm_maskz_max_ps(__mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmaxps

Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_max_ps

__m256 _mm256_mask_max_ps(__m256 src, __mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmaxps

Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_max_ps

__m256 _mm256_maskz_max_ps(__mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmaxps

Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_min_pd

__m128d _mm_mask_min_pd(__m128d src, __mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vminpd

Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_min_pd

__m128d _mm_maskz_min_pd(__mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vminpd

Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_min_pd

__m256d _mm256_mask_min_pd(__m256d src, __mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vminpd

Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_min_pd

__m256d _mm256_maskz_min_pd(__mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vminpd

Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_min_ps

__m128 _mm_mask_min_ps(__m128 src, __mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vminps

Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_min_ps

__m128 _mm_maskz_min_ps(__mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vminps

Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_min_ps

__m256 _mm256_mask_min_ps(__m256 src, __mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vminps

Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_min_ps

__m256 _mm256_maskz_min_ps(__mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vminps

Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_mul_pd

__m128d _mm_mask_mul_pd(__m128d src, __mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmulpd

Multiply packed double-precision (64-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). RM.



_mm_maskz_mul_pd

__m128d _mm_maskz_mul_pd(__mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmulpd

Multiply packed double-precision (64-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_mul_pd

__m256d _mm256_mask_mul_pd(__m256d src, __mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmulpd

Multiply packed double-precision (64-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_mul_pd

__m256d _mm256_maskz_mul_pd(__mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmulpd

Multiply packed double-precision (64-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_mul_ps

__m128 _mm_mask_mul_ps(__m128 src, __mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmulps

Multiply packed single-precision (32-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). RM.



_mm_maskz_mul_ps

__m128 _mm_maskz_mul_ps(__mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmulps

Multiply packed single-precision (32-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_mul_ps

__m256 _mm256_mask_mul_ps(__m256 src, __mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmulps

Multiply packed single-precision (32-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). RM.



_mm256_maskz_mul_ps

__m256 _mm256_maskz_mul_ps(__mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmulps

Multiply packed single-precision (32-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_rcp14_pd

__m128d _mm_mask_rcp14_pd(__m128d src, __mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrcp14pd

Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.



_mm_maskz_rcp14_pd

__m128d _mm_maskz_rcp14_pd(__mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrcp14pd

Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.



_mm_rcp14_pd

__m128d _mm_rcp14_pd(__m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrcp14pd

Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and return the results. The maximum relative error for this approximation is less than 2^-14.



_mm256_mask_rcp14_pd

__m256d _mm256_mask_rcp14_pd(__m256d src, __mmask8 k, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrcp14pd

Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.



_mm256_maskz_rcp14_pd

__m256d _mm256_maskz_rcp14_pd(__mmask8 k, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrcp14pd

Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.



_mm256_rcp14_pd

__m256d _mm256_rcp14_pd(__m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrcp14pd

Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and return the results. The maximum relative error for this approximation is less than 2^-14.



_mm_mask_rcp14_ps

__m128 _mm_mask_rcp14_ps(__m128 src, __mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrcp14ps

Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.



_mm_maskz_rcp14_ps

__m128 _mm_maskz_rcp14_ps(__mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrcp14ps

Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.



_mm_rcp14_ps

__m128 _mm_rcp14_ps(__m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrcp14ps

Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and return the results. The maximum relative error for this approximation is less than 2^-14.



_mm256_mask_rcp14_ps

__m256 _mm256_mask_rcp14_ps(__m256 src, __mmask8 k, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrcp14ps

Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.



_mm256_maskz_rcp14_ps

__m256 _mm256_maskz_rcp14_ps(__mmask8 k, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrcp14ps

Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.



_mm256_rcp14_ps

__m256 _mm256_rcp14_ps(__m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrcp14ps

Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and return the results. The maximum relative error for this approximation is less than 2^-14.



_mm_mask_rsqrt14_pd

__m128d _mm_mask_rsqrt14_pd(__m128d src, __mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrsqrt14pd

Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.



_mm_maskz_rsqrt14_pd

__m128d _mm_maskz_rsqrt14_pd(__mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrsqrt14pd

Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.



_mm256_mask_rsqrt14_pd

__m256d _mm256_mask_rsqrt14_pd(__m256d src, __mmask8 k, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrsqrt14pd

Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.



_mm256_maskz_rsqrt14_pd

__m256d _mm256_maskz_rsqrt14_pd(__mmask8 k, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrsqrt14pd

Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.



_mm_mask_rsqrt14_ps

__m128 _mm_mask_rsqrt14_ps(__m128 src, __mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrsqrt14ps

Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.



_mm_maskz_rsqrt14_ps

__m128 _mm_maskz_rsqrt14_ps(__mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrsqrt14ps

Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.



_mm256_mask_rsqrt14_ps

__m256 _mm256_mask_rsqrt14_ps(__m256 src, __mmask8 k, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrsqrt14ps

Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.



_mm256_maskz_rsqrt14_ps

__m256 _mm256_maskz_rsqrt14_ps(__mmask8 k, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrsqrt14ps

Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.



_mm_mask_sqrt_pd

__m128d _mm_mask_sqrt_pd(__m128d src, __mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vsqrtpd

Compute the square root of packed double-precision (64-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_sqrt_pd

__m128d _mm_maskz_sqrt_pd(__mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vsqrtpd

Compute the square root of packed double-precision (64-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_sqrt_pd

__m256d _mm256_mask_sqrt_pd(__m256d src, __mmask8 k, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vsqrtpd

Compute the square root of packed double-precision (64-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_sqrt_pd

__m256d _mm256_maskz_sqrt_pd(__mmask8 k, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vsqrtpd

Compute the square root of packed double-precision (64-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_sqrt_ps

__m128 _mm_mask_sqrt_ps(__m128 src, __mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vsqrtps

Compute the square root of packed single-precision (32-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_sqrt_ps

__m128 _mm_maskz_sqrt_ps(__mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vsqrtps

Compute the square root of packed single-precision (32-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_sqrt_ps

__m256 _mm256_mask_sqrt_ps(__m256 src, __mmask8 k, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vsqrtps

Compute the square root of packed single-precision (32-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_sqrt_ps

__m256 _mm256_maskz_sqrt_ps(__mmask8 k, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vsqrtps

Compute the square root of packed single-precision (32-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_sub_pd

__m128d _mm_mask_sub_pd(__m128d src, __mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vsubpd

Subtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_sub_pd

__m128d _mm_maskz_sub_pd(__mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vsubpd

Subtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_sub_pd

__m256d _mm256_mask_sub_pd(__m256d src, __mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vsubpd

Subtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_sub_pd

__m256d _mm256_maskz_sub_pd(__mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vsubpd

Subtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_sub_ps

__m128 _mm_mask_sub_ps(__m128 src, __mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vsubps

Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_sub_ps

__m128 _mm_maskz_sub_ps(__mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vsubps

Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_sub_ps

__m256 _mm256_mask_sub_ps(__m256 src, __mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vsubps

Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_sub_ps

__m256 _mm256_maskz_sub_ps(__mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vsubps

Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_abs_epi8

__m128i _mm_mask_abs_epi8(__m128i src, __mmask16 k, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpabsb

Compute the absolute value of packed 8-bit integers in a, and store the unsigned results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_abs_epi8

__m128i _mm_maskz_abs_epi8(__mmask16 k, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpabsb

Compute the absolute value of packed 8-bit integers in a, and store the unsigned results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_abs_epi8

__m256i _mm256_mask_abs_epi8(__m256i src, __mmask32 k, __m256i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpabsb

Compute the absolute value of packed 8-bit integers in a, and store the unsigned results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_abs_epi8

__m256i _mm256_maskz_abs_epi8(__mmask32 k, __m256i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpabsb

Compute the absolute value of packed 8-bit integers in a, and store the unsigned results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_abs_epi8

__m512i _mm512_abs_epi8(__m512i a)

CPUID Flags: AVX512BW

Instruction(s): vpabsb

Compute the absolute value of packed 8-bit integers in a, and store the unsigned results in the return value.



_mm512_mask_abs_epi8

__m512i _mm512_mask_abs_epi8(__m512i src, __mmask64 k, __m512i a)

CPUID Flags: AVX512BW

Instruction(s): vpabsb

Compute the absolute value of packed 8-bit integers in a, and store the unsigned results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_abs_epi8

__m512i _mm512_maskz_abs_epi8(__mmask64 k, __m512i a)

CPUID Flags: AVX512BW

Instruction(s): vpabsb

Compute the absolute value of packed 8-bit integers in a, and store the unsigned results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_abs_epi32

__m128i _mm_mask_abs_epi32(__m128i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpabsd

Compute the absolute value of packed 32-bit integers in a, and store the unsigned results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_abs_epi32

__m128i _mm_maskz_abs_epi32(__mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpabsd

Compute the absolute value of packed 32-bit integers in a, and store the unsigned results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_abs_epi32

__m256i _mm256_mask_abs_epi32(__m256i src, __mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpabsd

Compute the absolute value of packed 32-bit integers in a, and store the unsigned results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_abs_epi32

__m256i _mm256_maskz_abs_epi32(__mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpabsd

Compute the absolute value of packed 32-bit integers in a, and store the unsigned results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_abs_epi64

__m128i _mm_abs_epi64(__m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpabsq

Compute the absolute value of packed 64-bit integers in a, and store the unsigned results in the return value.



_mm_mask_abs_epi64

__m128i _mm_mask_abs_epi64(__m128i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpabsq

Compute the absolute value of packed 64-bit integers in a, and store the unsigned results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_abs_epi64

__m128i _mm_maskz_abs_epi64(__mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpabsq

Compute the absolute value of packed 64-bit integers in a, and store the unsigned results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_abs_epi64

__m256i _mm256_abs_epi64(__m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpabsq

Compute the absolute value of packed 64-bit integers in a, and store the unsigned results in the return value.



_mm256_mask_abs_epi64

__m256i _mm256_mask_abs_epi64(__m256i src, __mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpabsq

Compute the absolute value of packed 64-bit integers in a, and store the unsigned results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_abs_epi64

__m256i _mm256_maskz_abs_epi64(__mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpabsq

Compute the absolute value of packed 64-bit integers in a, and store the unsigned results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_abs_epi16

__m128i _mm_mask_abs_epi16(__m128i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpabsw

Compute the absolute value of packed 16-bit integers in a, and store the unsigned results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_abs_epi16

__m128i _mm_maskz_abs_epi16(__mmask8 k, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpabsw

Compute the absolute value of packed 16-bit integers in a, and store the unsigned results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_abs_epi16

__m256i _mm256_mask_abs_epi16(__m256i src, __mmask16 k, __m256i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpabsw

Compute the absolute value of packed 16-bit integers in a, and store the unsigned results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_abs_epi16

__m256i _mm256_maskz_abs_epi16(__mmask16 k, __m256i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpabsw

Compute the absolute value of packed 16-bit integers in a, and store the unsigned results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_abs_epi16

__m512i _mm512_abs_epi16(__m512i a)

CPUID Flags: AVX512BW

Instruction(s): vpabsw

Compute the absolute value of packed 16-bit integers in a, and store the unsigned results in the return value.



_mm512_mask_abs_epi16

__m512i _mm512_mask_abs_epi16(__m512i src, __mmask32 k, __m512i a)

CPUID Flags: AVX512BW

Instruction(s): vpabsw

Compute the absolute value of packed 16-bit integers in a, and store the unsigned results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_abs_epi16

__m512i _mm512_maskz_abs_epi16(__mmask32 k, __m512i a)

CPUID Flags: AVX512BW

Instruction(s): vpabsw

Compute the absolute value of packed 16-bit integers in a, and store the unsigned results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_add_epi8

__m128i _mm_mask_add_epi8(__m128i src, __mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpaddb

Add packed 8-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_add_epi8

__m128i _mm_maskz_add_epi8(__mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpaddb

Add packed 8-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_add_epi8

__m256i _mm256_mask_add_epi8(__m256i src, __mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpaddb

Add packed 8-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_add_epi8

__m256i _mm256_maskz_add_epi8(__mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpaddb

Add packed 8-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_add_epi8

__m512i _mm512_add_epi8(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpaddb

Add packed 8-bit integers in a and b, and return the results.



_mm512_mask_add_epi8

__m512i _mm512_mask_add_epi8(__m512i src, __mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpaddb

Add packed 8-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_add_epi8

__m512i _mm512_maskz_add_epi8(__mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpaddb

Add packed 8-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_add_epi32

__m128i _mm_mask_add_epi32(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpaddd

Add packed 32-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_add_epi32

__m128i _mm_maskz_add_epi32(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpaddd

Add packed 32-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_add_epi32

__m256i _mm256_mask_add_epi32(__m256i src, __mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpaddd

Add packed 32-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_add_epi32

__m256i _mm256_maskz_add_epi32(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpaddd

Add packed 32-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_add_epi64

__m128i _mm_mask_add_epi64(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpaddq

Add packed 64-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_add_epi64

__m128i _mm_maskz_add_epi64(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpaddq

Add packed 64-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_add_epi64

__m256i _mm256_mask_add_epi64(__m256i src, __mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpaddq

Add packed 64-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_add_epi64

__m256i _mm256_maskz_add_epi64(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpaddq

Add packed 64-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_adds_epi8

__m128i _mm_mask_adds_epi8(__m128i src, __mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpaddsb

Add packed 8-bit integers in a and b using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_adds_epi8

__m128i _mm_maskz_adds_epi8(__mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpaddsb

Add packed 8-bit integers in a and b using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_adds_epi8

__m256i _mm256_mask_adds_epi8(__m256i src, __mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpaddsb

Add packed 8-bit integers in a and b using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_adds_epi8

__m256i _mm256_maskz_adds_epi8(__mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpaddsb

Add packed 8-bit integers in a and b using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_adds_epi8

__m512i _mm512_adds_epi8(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpaddsb

Add packed 8-bit integers in a and b using saturation, and return the results.



_mm512_mask_adds_epi8

__m512i _mm512_mask_adds_epi8(__m512i src, __mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpaddsb

Add packed 8-bit integers in a and b using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_adds_epi8

__m512i _mm512_maskz_adds_epi8(__mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpaddsb

Add packed 8-bit integers in a and b using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_adds_epi16

__m128i _mm_mask_adds_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpaddsw

Add packed 16-bit integers in a and b using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_adds_epi16

__m128i _mm_maskz_adds_epi16(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpaddsw

Add packed 16-bit integers in a and b using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_adds_epi16

__m256i _mm256_mask_adds_epi16(__m256i src, __mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpaddsw

Add packed 16-bit integers in a and b using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_adds_epi16

__m256i _mm256_maskz_adds_epi16(__mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpaddsw

Add packed 16-bit integers in a and b using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_adds_epi16

__m512i _mm512_adds_epi16(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpaddsw

Add packed 16-bit integers in a and b using saturation, and return the results.



_mm512_mask_adds_epi16

__m512i _mm512_mask_adds_epi16(__m512i src, __mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpaddsw

Add packed 16-bit integers in a and b using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_adds_epi16

__m512i _mm512_maskz_adds_epi16(__mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpaddsw

Add packed 16-bit integers in a and b using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_adds_epu8

__m128i _mm_mask_adds_epu8(__m128i src, __mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpaddusb

Add packed unsigned 8-bit integers in a and b using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_adds_epu8

__m128i _mm_maskz_adds_epu8(__mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpaddusb

Add packed unsigned 8-bit integers in a and b using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_adds_epu8

__m256i _mm256_mask_adds_epu8(__m256i src, __mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpaddusb

Add packed unsigned 8-bit integers in a and b using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_adds_epu8

__m256i _mm256_maskz_adds_epu8(__mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpaddusb

Add packed unsigned 8-bit integers in a and b using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_adds_epu8

__m512i _mm512_adds_epu8(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpaddusb

Add packed unsigned 8-bit integers in a and b using saturation, and return the results.



_mm512_mask_adds_epu8

__m512i _mm512_mask_adds_epu8(__m512i src, __mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpaddusb

Add packed unsigned 8-bit integers in a and b using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_adds_epu8

__m512i _mm512_maskz_adds_epu8(__mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpaddusb

Add packed unsigned 8-bit integers in a and b using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_adds_epu16

__m128i _mm_mask_adds_epu16(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpaddusw

Add packed unsigned 16-bit integers in a and b using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_adds_epu16

__m128i _mm_maskz_adds_epu16(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpaddusw

Add packed unsigned 16-bit integers in a and b using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_adds_epu16

__m256i _mm256_mask_adds_epu16(__m256i src, __mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpaddusw

Add packed unsigned 16-bit integers in a and b using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_adds_epu16

__m256i _mm256_maskz_adds_epu16(__mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpaddusw

Add packed unsigned 16-bit integers in a and b using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_adds_epu16

__m512i _mm512_adds_epu16(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpaddusw

Add packed unsigned 16-bit integers in a and b using saturation, and return the results.



_mm512_mask_adds_epu16

__m512i _mm512_mask_adds_epu16(__m512i src, __mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpaddusw

Add packed unsigned 16-bit integers in a and b using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_adds_epu16

__m512i _mm512_maskz_adds_epu16(__mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpaddusw

Add packed unsigned 16-bit integers in a and b using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_add_epi16

__m128i _mm_mask_add_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpaddw

Add packed 16-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_add_epi16

__m128i _mm_maskz_add_epi16(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpaddw

Add packed 16-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_add_epi16

__m256i _mm256_mask_add_epi16(__m256i src, __mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpaddw

Add packed 16-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_add_epi16

__m256i _mm256_maskz_add_epi16(__mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpaddw

Add packed 16-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_add_epi16

__m512i _mm512_add_epi16(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpaddw

Add packed 16-bit integers in a and b, and return the results.



_mm512_mask_add_epi16

__m512i _mm512_mask_add_epi16(__m512i src, __mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpaddw

Add packed 16-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_add_epi16

__m512i _mm512_maskz_add_epi16(__mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpaddw

Add packed 16-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_avg_epu8

__m128i _mm_mask_avg_epu8(__m128i src, __mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpavgb

Average packed unsigned 8-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_avg_epu8

__m128i _mm_maskz_avg_epu8(__mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpavgb

Average packed unsigned 8-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_avg_epu8

__m256i _mm256_mask_avg_epu8(__m256i src, __mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpavgb

Average packed unsigned 8-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_avg_epu8

__m256i _mm256_maskz_avg_epu8(__mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpavgb

Average packed unsigned 8-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_avg_epu8

__m512i _mm512_avg_epu8(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpavgb

Average packed unsigned 8-bit integers in a and b, and return the results.



_mm512_mask_avg_epu8

__m512i _mm512_mask_avg_epu8(__m512i src, __mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpavgb

Average packed unsigned 8-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_avg_epu8

__m512i _mm512_maskz_avg_epu8(__mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpavgb

Average packed unsigned 8-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_avg_epu16

__m128i _mm_mask_avg_epu16(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpavgw

Average packed unsigned 16-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_avg_epu16

__m128i _mm_maskz_avg_epu16(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpavgw

Average packed unsigned 16-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_avg_epu16

__m256i _mm256_mask_avg_epu16(__m256i src, __mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpavgw

Average packed unsigned 16-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_avg_epu16

__m256i _mm256_maskz_avg_epu16(__mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpavgw

Average packed unsigned 16-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_avg_epu16

__m512i _mm512_avg_epu16(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpavgw

Average packed unsigned 16-bit integers in a and b, and return the results.



_mm512_mask_avg_epu16

__m512i _mm512_mask_avg_epu16(__m512i src, __mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpavgw

Average packed unsigned 16-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_avg_epu16

__m512i _mm512_maskz_avg_epu16(__mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpavgw

Average packed unsigned 16-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_maddubs_epi16

__m128i _mm_mask_maddubs_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmaddubsw

Multiply packed unsigned 8-bit integers in a by packed signed 8-bit integers in b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_maddubs_epi16

__m128i _mm_maskz_maddubs_epi16(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmaddubsw

Multiply packed unsigned 8-bit integers in a by packed signed 8-bit integers in b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_maddubs_epi16

__m256i _mm256_mask_maddubs_epi16(__m256i src, __mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmaddubsw

Multiply packed unsigned 8-bit integers in a by packed signed 8-bit integers in b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_maddubs_epi16

__m256i _mm256_maskz_maddubs_epi16(__mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmaddubsw

Multiply packed unsigned 8-bit integers in a by packed signed 8-bit integers in b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_maddubs_epi16

__m512i _mm512_maddubs_epi16(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmaddubsw

Vertically multiply each unsigned 8-bit integer from a with the corresponding signed 8-bit integer from b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in the return value.



_mm512_mask_maddubs_epi16

__m512i _mm512_mask_maddubs_epi16(__m512i src, __mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmaddubsw

Multiply packed unsigned 8-bit integers in a by packed signed 8-bit integers in b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_maddubs_epi16

__m512i _mm512_maskz_maddubs_epi16(__mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmaddubsw

Multiply packed unsigned 8-bit integers in a by packed signed 8-bit integers in b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_madd_epi16

__m128i _mm_mask_madd_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmaddwd

Multiply packed 16-bit integers in a and b, producing intermediate 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the saturated results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_madd_epi16

__m128i _mm_maskz_madd_epi16(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmaddwd

Multiply packed 16-bit integers in a and b, producing intermediate 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the saturated results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_madd_epi16

__m256i _mm256_mask_madd_epi16(__m256i src, __mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmaddwd

Multiply packed 16-bit integers in a and b, producing intermediate 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the saturated results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_madd_epi16

__m256i _mm256_maskz_madd_epi16(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmaddwd

Multiply packed 16-bit integers in a and b, producing intermediate 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the saturated results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_madd_epi16

__m512i _mm512_madd_epi16(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmaddwd

Multiply packed 16-bit integers in a and b, producing intermediate 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the saturated results in the return value.



_mm512_mask_madd_epi16

__m512i _mm512_mask_madd_epi16(__m512i src, __mmask16 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmaddwd

Multiply packed 16-bit integers in a and b, producing intermediate 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the saturated results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_madd_epi16

__m512i _mm512_maskz_madd_epi16(__mmask16 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmaddwd

Multiply packed 16-bit integers in a and b, producing intermediate 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the saturated results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_max_epi8

__m128i _mm_mask_max_epi8(__m128i src, __mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmaxsb

Compare packed 8-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_max_epi8

__m128i _mm_maskz_max_epi8(__mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmaxsb

Compare packed 8-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_max_epi8

__m256i _mm256_mask_max_epi8(__m256i src, __mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmaxsb

Compare packed 8-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_max_epi8

__m256i _mm256_maskz_max_epi8(__mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmaxsb

Compare packed 8-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_max_epi8

__m512i _mm512_mask_max_epi8(__m512i src, __mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmaxsb

Compare packed 8-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_max_epi8

__m512i _mm512_maskz_max_epi8(__mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmaxsb

Compare packed 8-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_max_epi8

__m512i _mm512_max_epi8(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmaxsb

Compare packed 8-bit integers in a and b, and store packed maximum values in the return value.



_mm_mask_max_epi32

__m128i _mm_mask_max_epi32(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmaxsd

Compare packed 32-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_max_epi32

__m128i _mm_maskz_max_epi32(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmaxsd

Compare packed 32-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_max_epi32

__m256i _mm256_mask_max_epi32(__m256i src, __mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmaxsd

Compare packed 32-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_max_epi32

__m256i _mm256_maskz_max_epi32(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmaxsd

Compare packed 32-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_max_epi64

__m128i _mm_mask_max_epi64(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmaxsq

Compare packed 64-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_max_epi64

__m128i _mm_maskz_max_epi64(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmaxsq

Compare packed 64-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_max_epi64

__m128i _mm_max_epi64(__m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmaxsq

Compare packed 64-bit integers in a and b, and store packed maximum values in the return value.



_mm256_mask_max_epi64

__m256i _mm256_mask_max_epi64(__m256i src, __mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmaxsq

Compare packed 64-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_max_epi64

__m256i _mm256_maskz_max_epi64(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmaxsq

Compare packed 64-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_max_epi64

__m256i _mm256_max_epi64(__m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmaxsq

Compare packed 64-bit integers in a and b, and store packed maximum values in the return value.



_mm_mask_max_epi16

__m128i _mm_mask_max_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmaxsw

Compare packed 16-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_max_epi16

__m128i _mm_maskz_max_epi16(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmaxsw

Compare packed 16-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_max_epi16

__m256i _mm256_mask_max_epi16(__m256i src, __mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmaxsw

Compare packed 16-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_max_epi16

__m256i _mm256_maskz_max_epi16(__mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmaxsw

Compare packed 16-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_max_epi16

__m512i _mm512_mask_max_epi16(__m512i src, __mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmaxsw

Compare packed 16-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_max_epi16

__m512i _mm512_maskz_max_epi16(__mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmaxsw

Compare packed 16-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_max_epi16

__m512i _mm512_max_epi16(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmaxsw

Compare packed 16-bit integers in a and b, and store packed maximum values in the return value.



_mm_mask_max_epu8

__m128i _mm_mask_max_epu8(__m128i src, __mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmaxub

Compare packed unsigned 8-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_max_epu8

__m128i _mm_maskz_max_epu8(__mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmaxub

Compare packed unsigned 8-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_max_epu8

__m256i _mm256_mask_max_epu8(__m256i src, __mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmaxub

Compare packed unsigned 8-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_max_epu8

__m256i _mm256_maskz_max_epu8(__mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmaxub

Compare packed unsigned 8-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_max_epu8

__m512i _mm512_mask_max_epu8(__m512i src, __mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmaxub

Compare packed unsigned 8-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_max_epu8

__m512i _mm512_maskz_max_epu8(__mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmaxub

Compare packed unsigned 8-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_max_epu8

__m512i _mm512_max_epu8(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmaxub

Compare packed unsigned 8-bit integers in a and b, and store packed maximum values in the return value.



_mm_mask_max_epu32

__m128i _mm_mask_max_epu32(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmaxud

Compare packed unsigned 32-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_max_epu32

__m128i _mm_maskz_max_epu32(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmaxud

Compare packed unsigned 32-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_max_epu32

__m256i _mm256_mask_max_epu32(__m256i src, __mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmaxud

Compare packed unsigned 32-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_max_epu32

__m256i _mm256_maskz_max_epu32(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmaxud

Compare packed unsigned 32-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_max_epu64

__m128i _mm_mask_max_epu64(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmaxuq

Compare packed unsigned 64-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_max_epu64

__m128i _mm_maskz_max_epu64(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmaxuq

Compare packed unsigned 64-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_max_epu64

__m128i _mm_max_epu64(__m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmaxuq

Compare packed unsigned 64-bit integers in a and b, and store packed maximum values in the return value.



_mm256_mask_max_epu64

__m256i _mm256_mask_max_epu64(__m256i src, __mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmaxuq

Compare packed unsigned 64-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_max_epu64

__m256i _mm256_maskz_max_epu64(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmaxuq

Compare packed unsigned 64-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_max_epu64

__m256i _mm256_max_epu64(__m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmaxuq

Compare packed unsigned 64-bit integers in a and b, and store packed maximum values in the return value.



_mm_mask_max_epu16

__m128i _mm_mask_max_epu16(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmaxuw

Compare packed unsigned 16-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_max_epu16

__m128i _mm_maskz_max_epu16(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmaxuw

Compare packed unsigned 16-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_max_epu16

__m256i _mm256_mask_max_epu16(__m256i src, __mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmaxuw

Compare packed unsigned 16-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_max_epu16

__m256i _mm256_maskz_max_epu16(__mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmaxuw

Compare packed unsigned 16-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_max_epu16

__m512i _mm512_mask_max_epu16(__m512i src, __mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmaxuw

Compare packed unsigned 16-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_max_epu16

__m512i _mm512_maskz_max_epu16(__mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmaxuw

Compare packed unsigned 16-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_max_epu16

__m512i _mm512_max_epu16(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmaxuw

Compare packed unsigned 16-bit integers in a and b, and store packed maximum values in the return value.



_mm_mask_min_epi8

__m128i _mm_mask_min_epi8(__m128i src, __mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpminsb

Compare packed 8-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_min_epi8

__m128i _mm_maskz_min_epi8(__mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpminsb

Compare packed 8-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_min_epi8

__m256i _mm256_mask_min_epi8(__m256i src, __mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpminsb

Compare packed 8-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_min_epi8

__m256i _mm256_maskz_min_epi8(__mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpminsb

Compare packed 8-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_min_epi8

__m512i _mm512_mask_min_epi8(__m512i src, __mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpminsb

Compare packed 8-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_min_epi8

__m512i _mm512_maskz_min_epi8(__mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpminsb

Compare packed 8-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_min_epi8

__m512i _mm512_min_epi8(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpminsb

Compare packed 8-bit integers in a and b, and store packed minimum values in the return value.



_mm_mask_min_epi32

__m128i _mm_mask_min_epi32(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpminsd

Compare packed 32-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_min_epi32

__m128i _mm_maskz_min_epi32(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpminsd

Compare packed 32-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_min_epi32

__m256i _mm256_mask_min_epi32(__m256i src, __mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpminsd

Compare packed 32-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_min_epi32

__m256i _mm256_maskz_min_epi32(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpminsd

Compare packed 32-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_min_epi64

__m128i _mm_mask_min_epi64(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpminsq

Compare packed 64-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_min_epi64

__m128i _mm_maskz_min_epi64(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpminsq

Compare packed 64-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_min_epi64

__m128i _mm_min_epi64(__m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpminsq

Compare packed 64-bit integers in a and b, and store packed minimum values in the return value.



_mm256_mask_min_epi64

__m256i _mm256_mask_min_epi64(__m256i src, __mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpminsq

Compare packed 64-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_min_epi64

__m256i _mm256_maskz_min_epi64(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpminsq

Compare packed 64-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_min_epi64

__m256i _mm256_min_epi64(__m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpminsq

Compare packed 64-bit integers in a and b, and store packed minimum values in the return value.



_mm_mask_min_epi16

__m128i _mm_mask_min_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpminsw

Compare packed 16-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_min_epi16

__m128i _mm_maskz_min_epi16(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpminsw

Compare packed 16-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_min_epi16

__m256i _mm256_mask_min_epi16(__m256i src, __mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpminsw

Compare packed 16-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_min_epi16

__m256i _mm256_maskz_min_epi16(__mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpminsw

Compare packed 16-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_min_epi16

__m512i _mm512_mask_min_epi16(__m512i src, __mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpminsw

Compare packed 16-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_min_epi16

__m512i _mm512_maskz_min_epi16(__mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpminsw

Compare packed 16-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_min_epi16

__m512i _mm512_min_epi16(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpminsw

Compare packed 16-bit integers in a and b, and store packed minimum values in the return value.



_mm_mask_min_epu8

__m128i _mm_mask_min_epu8(__m128i src, __mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpminub

Compare packed unsigned 8-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_min_epu8

__m128i _mm_maskz_min_epu8(__mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpminub

Compare packed unsigned 8-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_min_epu8

__m256i _mm256_mask_min_epu8(__m256i src, __mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpminub

Compare packed unsigned 8-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_min_epu8

__m256i _mm256_maskz_min_epu8(__mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpminub

Compare packed unsigned 8-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_min_epu8

__m512i _mm512_mask_min_epu8(__m512i src, __mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpminub

Compare packed unsigned 8-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_min_epu8

__m512i _mm512_maskz_min_epu8(__mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpminub

Compare packed unsigned 8-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_min_epu8

__m512i _mm512_min_epu8(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpminub

Compare packed unsigned 8-bit integers in a and b, and store packed minimum values in the return value.



_mm_mask_min_epu32

__m128i _mm_mask_min_epu32(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpminud

Compare packed unsigned 32-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_min_epu32

__m128i _mm_maskz_min_epu32(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpminud

Compare packed unsigned 32-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_min_epu32

__m256i _mm256_mask_min_epu32(__m256i src, __mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpminud

Compare packed unsigned 32-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_min_epu32

__m256i _mm256_maskz_min_epu32(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpminud

Compare packed unsigned 32-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_min_epu64

__m128i _mm_mask_min_epu64(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpminuq

Compare packed unsigned 64-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_min_epu64

__m128i _mm_maskz_min_epu64(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpminuq

Compare packed unsigned 64-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_min_epu64

__m128i _mm_min_epu64(__m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpminuq

Compare packed unsigned 64-bit integers in a and b, and store packed minimum values in the return value.



_mm256_mask_min_epu64

__m256i _mm256_mask_min_epu64(__m256i src, __mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpminuq

Compare packed unsigned 64-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_min_epu64

__m256i _mm256_maskz_min_epu64(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpminuq

Compare packed unsigned 64-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_min_epu64

__m256i _mm256_min_epu64(__m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpminuq

Compare packed unsigned 64-bit integers in a and b, and store packed minimum values in the return value.



_mm_mask_min_epu16

__m128i _mm_mask_min_epu16(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpminuw

Compare packed unsigned 16-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_min_epu16

__m128i _mm_maskz_min_epu16(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpminuw

Compare packed unsigned 16-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_min_epu16

__m256i _mm256_mask_min_epu16(__m256i src, __mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpminuw

Compare packed unsigned 16-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_min_epu16

__m256i _mm256_maskz_min_epu16(__mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpminuw

Compare packed unsigned 16-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_min_epu16

__m512i _mm512_mask_min_epu16(__m512i src, __mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpminuw

Compare packed unsigned 16-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_min_epu16

__m512i _mm512_maskz_min_epu16(__mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpminuw

Compare packed unsigned 16-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_min_epu16

__m512i _mm512_min_epu16(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpminuw

Compare packed unsigned 16-bit integers in a and b, and store packed minimum values in the return value.



_mm_mask_mul_epi32

__m128i _mm_mask_mul_epi32(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmuldq

Multiply the low 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_mul_epi32

__m128i _mm_maskz_mul_epi32(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmuldq

Multiply the low 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_mul_epi32

__m256i _mm256_mask_mul_epi32(__m256i src, __mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmuldq

Multiply the low 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_mul_epi32

__m256i _mm256_maskz_mul_epi32(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmuldq

Multiply the low 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_mulhrs_epi16

__m128i _mm_mask_mulhrs_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmulhrsw

Multiply packed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_mulhrs_epi16

__m128i _mm_maskz_mulhrs_epi16(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmulhrsw

Multiply packed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_mulhrs_epi16

__m256i _mm256_mask_mulhrs_epi16(__m256i src, __mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmulhrsw

Multiply packed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_mulhrs_epi16

__m256i _mm256_maskz_mulhrs_epi16(__mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmulhrsw

Multiply packed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_mulhrs_epi16

__m512i _mm512_mask_mulhrs_epi16(__m512i src, __mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmulhrsw

Multiply packed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_mulhrs_epi16

__m512i _mm512_maskz_mulhrs_epi16(__mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmulhrsw

Multiply packed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mulhrs_epi16

__m512i _mm512_mulhrs_epi16(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmulhrsw

Multiply packed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to the return value.



_mm_mask_mulhi_epu16

__m128i _mm_mask_mulhi_epu16(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmulhuw

Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_mulhi_epu16

__m128i _mm_maskz_mulhi_epu16(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmulhuw

Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_mulhi_epu16

__m256i _mm256_mask_mulhi_epu16(__m256i src, __mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmulhuw

Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_mulhi_epu16

__m256i _mm256_maskz_mulhi_epu16(__mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmulhuw

Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_mulhi_epu16

__m512i _mm512_mask_mulhi_epu16(__m512i src, __mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmulhuw

Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_mulhi_epu16

__m512i _mm512_maskz_mulhi_epu16(__mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmulhuw

Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mulhi_epu16

__m512i _mm512_mulhi_epu16(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmulhuw

Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value.



_mm_mask_mulhi_epi16

__m128i _mm_mask_mulhi_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmulhw

Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_mulhi_epi16

__m128i _mm_maskz_mulhi_epi16(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmulhw

Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_mulhi_epi16

__m256i _mm256_mask_mulhi_epi16(__m256i src, __mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmulhw

Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_mulhi_epi16

__m256i _mm256_maskz_mulhi_epi16(__mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmulhw

Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_mulhi_epi16

__m512i _mm512_mask_mulhi_epi16(__m512i src, __mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmulhw

Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_mulhi_epi16

__m512i _mm512_maskz_mulhi_epi16(__mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmulhw

Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mulhi_epi16

__m512i _mm512_mulhi_epi16(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmulhw

Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value.



_mm_mask_mullo_epi32

__m128i _mm_mask_mullo_epi32(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmulld

Multiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_mullo_epi32

__m128i _mm_maskz_mullo_epi32(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmulld

Multiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_mullo_epi32

__m256i _mm256_mask_mullo_epi32(__m256i src, __mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmulld

Multiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_mullo_epi32

__m256i _mm256_maskz_mullo_epi32(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmulld

Multiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_mullo_epi64

__m128i _mm_mask_mullo_epi64(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vpmullq

Multiply the packed 64-bit integers in a and b, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_mullo_epi64

__m128i _mm_maskz_mullo_epi64(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vpmullq

Multiply the packed 64-bit integers in a and b, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mullo_epi64

__m128i _mm_mullo_epi64(__m128i a, __m128i b)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vpmullq

Multiply the packed 64-bit integers in a and b, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in the return value.



_mm256_mask_mullo_epi64

__m256i _mm256_mask_mullo_epi64(__m256i src, __mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vpmullq

Multiply the packed 64-bit integers in a and b, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_mullo_epi64

__m256i _mm256_maskz_mullo_epi64(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vpmullq

Multiply the packed 64-bit integers in a and b, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mullo_epi64

__m256i _mm256_mullo_epi64(__m256i a, __m256i b)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vpmullq

Multiply the packed 64-bit integers in a and b, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in the return value.



_mm512_mask_mullo_epi64

__m512i _mm512_mask_mullo_epi64(__m512i src, __mmask8 k, __m512i a, __m512i b)

CPUID Flags: AVX512DQ

Instruction(s): vpmullq

Multiply the packed 64-bit integers in a and b, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_mullo_epi64

__m512i _mm512_maskz_mullo_epi64(__mmask8 k, __m512i a, __m512i b)

CPUID Flags: AVX512DQ

Instruction(s): vpmullq

Multiply the packed 64-bit integers in a and b, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mullo_epi64

__m512i _mm512_mullo_epi64(__m512i a, __m512i b)

CPUID Flags: AVX512DQ

Instruction(s): vpmullq

Multiply the packed 64-bit integers in a and b, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in the return value.



_mm_mask_mullo_epi16

__m128i _mm_mask_mullo_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmullw

Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_mullo_epi16

__m128i _mm_maskz_mullo_epi16(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmullw

Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_mullo_epi16

__m256i _mm256_mask_mullo_epi16(__m256i src, __mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmullw

Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_mullo_epi16

__m256i _mm256_maskz_mullo_epi16(__mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmullw

Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_mullo_epi16

__m512i _mm512_mask_mullo_epi16(__m512i src, __mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmullw

Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_mullo_epi16

__m512i _mm512_maskz_mullo_epi16(__mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmullw

Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mullo_epi16

__m512i _mm512_mullo_epi16(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpmullw

Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in the return value.



_mm_mask_mul_epu32

__m128i _mm_mask_mul_epu32(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmuludq

Multiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and store the unsigned 64-bit results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_mul_epu32

__m128i _mm_maskz_mul_epu32(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmuludq

Multiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and store the unsigned 64-bit results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_mul_epu32

__m256i _mm256_mask_mul_epu32(__m256i src, __mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmuludq

Multiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and store the unsigned 64-bit results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_mul_epu32

__m256i _mm256_maskz_mul_epu32(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpmuludq

Multiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and store the unsigned 64-bit results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_sub_epi8

__m128i _mm_mask_sub_epi8(__m128i src, __mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpsubb

Subtract packed 8-bit integers in b from packed 8-bit integers in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_sub_epi8

__m128i _mm_maskz_sub_epi8(__mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpsubb

Subtract packed 8-bit integers in b from packed 8-bit integers in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_sub_epi8

__m256i _mm256_mask_sub_epi8(__m256i src, __mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpsubb

Subtract packed 8-bit integers in b from packed 8-bit integers in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_sub_epi8

__m256i _mm256_maskz_sub_epi8(__mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpsubb

Subtract packed 8-bit integers in b from packed 8-bit integers in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_sub_epi8

__m512i _mm512_mask_sub_epi8(__m512i src, __mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpsubb

Subtract packed 8-bit integers in b from packed 8-bit integers in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_sub_epi8

__m512i _mm512_maskz_sub_epi8(__mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpsubb

Subtract packed 8-bit integers in b from packed 8-bit integers in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_sub_epi8

__m512i _mm512_sub_epi8(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpsubb

Subtract packed 8-bit integers in b from packed 8-bit integers in a, and return the results.



_mm_mask_sub_epi32

__m128i _mm_mask_sub_epi32(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpsubd

Subtract packed 32-bit integers in b from packed 32-bit integers in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_sub_epi32

__m128i _mm_maskz_sub_epi32(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpsubd

Subtract packed 32-bit integers in b from packed 32-bit integers in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_sub_epi32

__m256i _mm256_mask_sub_epi32(__m256i src, __mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpsubd

Subtract packed 32-bit integers in b from packed 32-bit integers in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_sub_epi32

__m256i _mm256_maskz_sub_epi32(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpsubd

Subtract packed 32-bit integers in b from packed 32-bit integers in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_sub_epi64

__m128i _mm_mask_sub_epi64(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpsubq

Subtract packed 64-bit integers in b from packed 64-bit integers in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_sub_epi64

__m128i _mm_maskz_sub_epi64(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpsubq

Subtract packed 64-bit integers in b from packed 64-bit integers in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_sub_epi64

__m256i _mm256_mask_sub_epi64(__m256i src, __mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpsubq

Subtract packed 64-bit integers in b from packed 64-bit integers in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_sub_epi64

__m256i _mm256_maskz_sub_epi64(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpsubq

Subtract packed 64-bit integers in b from packed 64-bit integers in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_subs_epi8

__m128i _mm_mask_subs_epi8(__m128i src, __mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpsubsb

Subtract packed 8-bit integers in b from packed 8-bit integers in a using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_subs_epi8

__m128i _mm_maskz_subs_epi8(__mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpsubsb

Subtract packed 8-bit integers in b from packed 8-bit integers in a using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_subs_epi8

__m256i _mm256_mask_subs_epi8(__m256i src, __mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpsubsb

Subtract packed 8-bit integers in b from packed 8-bit integers in a using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_subs_epi8

__m256i _mm256_maskz_subs_epi8(__mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpsubsb

Subtract packed 8-bit integers in b from packed 8-bit integers in a using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_subs_epi8

__m512i _mm512_mask_subs_epi8(__m512i src, __mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpsubsb

Subtract packed 8-bit integers in b from packed 8-bit integers in a using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_subs_epi8

__m512i _mm512_maskz_subs_epi8(__mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpsubsb

Subtract packed 8-bit integers in b from packed 8-bit integers in a using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_subs_epi8

__m512i _mm512_subs_epi8(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpsubsb

Subtract packed 8-bit integers in b from packed 8-bit integers in a using saturation, and return the results.



_mm_mask_subs_epi16

__m128i _mm_mask_subs_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpsubsw

Subtract packed 16-bit integers in b from packed 16-bit integers in a using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_subs_epi16

__m128i _mm_maskz_subs_epi16(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpsubsw

Subtract packed 16-bit integers in b from packed 16-bit integers in a using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_subs_epi16

__m256i _mm256_mask_subs_epi16(__m256i src, __mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpsubsw

Subtract packed 16-bit integers in b from packed 16-bit integers in a using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_subs_epi16

__m256i _mm256_maskz_subs_epi16(__mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpsubsw

Subtract packed 16-bit integers in b from packed 16-bit integers in a using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_subs_epi16

__m512i _mm512_mask_subs_epi16(__m512i src, __mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpsubsw

Subtract packed 16-bit integers in b from packed 16-bit integers in a using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_subs_epi16

__m512i _mm512_maskz_subs_epi16(__mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpsubsw

Subtract packed 16-bit integers in b from packed 16-bit integers in a using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_subs_epi16

__m512i _mm512_subs_epi16(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpsubsw

Subtract packed 16-bit integers in b from packed 16-bit integers in a using saturation, and return the results.



_mm_mask_subs_epu8

__m128i _mm_mask_subs_epu8(__m128i src, __mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpsubusb

Subtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_subs_epu8

__m128i _mm_maskz_subs_epu8(__mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpsubusb

Subtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_subs_epu8

__m256i _mm256_mask_subs_epu8(__m256i src, __mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpsubusb

Subtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_subs_epu8

__m256i _mm256_maskz_subs_epu8(__mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpsubusb

Subtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_subs_epu8

__m512i _mm512_mask_subs_epu8(__m512i src, __mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpsubusb

Subtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_subs_epu8

__m512i _mm512_maskz_subs_epu8(__mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpsubusb

Subtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_subs_epu8

__m512i _mm512_subs_epu8(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpsubusb

Subtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and return the results.



_mm_mask_subs_epu16

__m128i _mm_mask_subs_epu16(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpsubusw

Subtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_subs_epu16

__m128i _mm_maskz_subs_epu16(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpsubusw

Subtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_subs_epu16

__m256i _mm256_mask_subs_epu16(__m256i src, __mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpsubusw

Subtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_subs_epu16

__m256i _mm256_maskz_subs_epu16(__mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpsubusw

Subtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_subs_epu16

__m512i _mm512_mask_subs_epu16(__m512i src, __mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpsubusw

Subtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_subs_epu16

__m512i _mm512_maskz_subs_epu16(__mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpsubusw

Subtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_subs_epu16

__m512i _mm512_subs_epu16(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpsubusw

Subtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and return the results.



_mm_mask_sub_epi16

__m128i _mm_mask_sub_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpsubw

Subtract packed 16-bit integers in b from packed 16-bit integers in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_sub_epi16

__m128i _mm_maskz_sub_epi16(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpsubw

Subtract packed 16-bit integers in b from packed 16-bit integers in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_sub_epi16

__m256i _mm256_mask_sub_epi16(__m256i src, __mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpsubw

Subtract packed 16-bit integers in b from packed 16-bit integers in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_sub_epi16

__m256i _mm256_maskz_sub_epi16(__mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpsubw

Subtract packed 16-bit integers in b from packed 16-bit integers in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_sub_epi16

__m512i _mm512_mask_sub_epi16(__m512i src, __mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpsubw

Subtract packed 16-bit integers in b from packed 16-bit integers in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_sub_epi16

__m512i _mm512_maskz_sub_epi16(__mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpsubw

Subtract packed 16-bit integers in b from packed 16-bit integers in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_sub_epi16

__m512i _mm512_sub_epi16(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpsubw

Subtract packed 16-bit integers in b from packed 16-bit integers in a, and return the results.



_mm_madd52hi_epu64

__m128i _mm_madd52hi_epu64(__m128i a, __m128i b, __m128i c);

CPUID Flags: AVX512IFMA52, AVX512VL

Instruction(s): vpmadd52huq

Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result.



_mm_mask_madd52hi_epu64

__m128i _mm_mask_madd52hi_epu64(__m128i a, __mmask8 k, __m128i b, __m128i c);

CPUID Flags: AVX512IFMA52, AVX512VL

Instruction(s): vpmadd52huq

Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_maskz_madd52hi_epu64

__m128i _mm_maskz_madd52hi_epu64(__mmask8 k, __m128i a, __m128i b, __m128i c);

CPUID Flags: AVX512IFMA52, AVX512VL

Instruction(s): vpmadd52huq

Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_madd52hi_epu64

__m256i _mm256_madd52hi_epu64(__m256i a, __m256i b, __m256i c);

CPUID Flags: AVX512IFMA52, AVX512VL

Instruction(s): vpmadd52huq

Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result.



_mm256_mask_madd52hi_epu64

__m256i _mm256_mask_madd52hi_epu64(__m256i a, __mmask8 k, __m256i b, __m256i c);

CPUID Flags: AVX512IFMA52, AVX512VL

Instruction(s): vpmadd52huq

Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_maskz_madd52hi_epu64

__m256i _mm256_maskz_madd52hi_epu64(__mmask8 k, __m256i a, __m256i b, __m256i c);

CPUID Flags: AVX512IFMA52, AVX512VL

Instruction(s): vpmadd52huq

Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_madd52hi_epu64

__m512i _mm512_madd52hi_epu64(__m512i a, __m512i b, __m512i c);

CPUID Flags: AVX512IFMA52

Instruction(s): vpmadd52huq

Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result.



_mm512_mask_madd52hi_epu64

__m512i _mm512_mask_madd52hi_epu64(__m512i a, __mmask8 k, __m512i b, __m512i c);

CPUID Flags: AVX512IFMA52

Instruction(s): vpmadd52huq

Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm512_maskz_madd52hi_epu64

__m512i _mm512_maskz_madd52hi_epu64(__mmask8 k, __m512i a, __m512i b, __m512i c);

CPUID Flags: AVX512IFMA52

Instruction(s): vpmadd52huq

Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_madd52lo_epu64

__m128i _mm_madd52lo_epu64(__m128i a, __m128i b, __m128i c);

CPUID Flags: AVX512IFMA52, AVX512VL

Instruction(s): vpmadd52luq

Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result.



_mm_mask_madd52lo_epu64

__m128i _mm_mask_madd52lo_epu64(__m128i a, __mmask8 k, __m128i b, __m128i c);

CPUID Flags: AVX512IFMA52, AVX512VL

Instruction(s): vpmadd52luq

Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_maskz_madd52lo_epu64

__m128i _mm_maskz_madd52lo_epu64(__mmask8 k, __m128i a, __m128i b, __m128i c);

CPUID Flags: AVX512IFMA52, AVX512VL

Instruction(s): vpmadd52luq

Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_madd52lo_epu64

__m256i _mm256_madd52lo_epu64(__m256i a, __m256i b, __m256i c);

CPUID Flags: AVX512IFMA52, AVX512VL

Instruction(s): vpmadd52luq

Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result.



_mm256_mask_madd52lo_epu64

__m256i _mm256_mask_madd52lo_epu64(__m256i a, __mmask8 k, __m256i b, __m256i c);

CPUID Flags: AVX512IFMA52, AVX512VL

Instruction(s): vpmadd52luq

Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_maskz_madd52lo_epu64

__m256i _mm256_maskz_madd52lo_epu64(__mmask8 k, __m256i a, __m256i b, __m256i c);

CPUID Flags: AVX512IFMA52, AVX512VL

Instruction(s): vpmadd52luq

Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_madd52lo_epu64

__m512i _mm512_madd52lo_epu64(__m512i a, __m512i b, __m512i c);

CPUID Flags: AVX512IFMA52

Instruction(s): vpmadd52luq

Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result.



_mm512_mask_madd52lo_epu64

__m512i _mm512_mask_madd52lo_epu64(__m512i a, __mmask8 k, __m512i b, __m512i c);

CPUID Flags: AVX512IFMA52

Instruction(s): vpmadd52luq

Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm512_maskz_madd52lo_epu64

__m512i _mm512_maskz_madd52lo_epu64(__mmask8 k, __m512i a, __m512i b, __m512i c);

CPUID Flags: AVX512IFMA52

Instruction(s): vpmadd52luq

Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).