Intrinsics for FP Fused Multiply-Add (FMA) Operations

Intel® C++ Compiler Classic Developer Guide and Reference

Download PDF

ID 767249

Date 12/16/2022

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-02582DC6-0693-4062-9FF0-8207B93C88FB

View Details

Intrinsics for FP Fused Multiply-Add (FMA) Operations

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>

Intrinsic Name	Operation	Corresponding Intel® AVX-512 Instruction
`_mm512_fmadd_pd`, `_mm512_mask3_fmadd_pd`, `_mm512_mask_fmadd_pd`, `_mm512_maskz_fmadd_pd` `_mm512_fmadd_round_pd`, `_mm512_mask3_fmadd_round_pd`, `_mm512_mask_fmadd_round_pd`, `_mm512_maskz_fmadd_round_pd`	Multiplies float64 element vector elements, then adds the intermediate result to float64 vector elements.	`VFMADD132PD`
`_mm512_fmadd_ps`, `_mm512_mask3_fmadd_ps`, `_mm512_mask_fmadd_ps`, `_mm512_maskz_fmadd_ps` `_mm512_fmadd_round_ps`, `_mm512_mask3_fmadd_round_ps`, `_mm512_mask_fmadd_round_ps`, `_mm512_maskz_fmadd_round_ps`	Multiplies float32 element vector elements, then adds the intermediate result to float32 vector elements.	`VFMADD132PS`
`_mm_mask3_fmadd_sd`, `_mm_mask_fmadd_sd`, `_mm_maskz_fmadd_sd` `_mm_mask3_fmadd_round_sd`, `_mm_mask_fmadd_round_sd`, `_mm_maskz_fmadd_round_sd`	Multiplies float64 element vector elements, then adds the intermediate result to float64 vector elements.	`VFMADD132SD`
`_mm_mask3_fmadd_ss`, `_mm_mask_fmadd_ss`, `_mm_maskz_fmadd_ss` `_mm_mask3_fmadd_round_ss`, `_mm_mask_fmadd_round_ss`, `_mm_maskz_fmadd_round_ss`	Multiplies float32 element vector elements, then adds the intermediate result to float32 vector elements.	`VFMADD132SS`
`_mm512_fmaddsub_pd`, `_mm512_mask3_fmaddsub_pd`, `_mm512_mask_fmaddsub_pd`, `_mm512_maskz_fmaddsub_pd` `_mm512_fmaddsub_round_pd`, `_mm512_mask3_fmaddsub_round_pd`, `_mm512_mask_fmaddsub_round_pd`, `_mm512_maskz_fmaddsub_round_pd`	Multiplies float64 element vector elements, then alternatively add and subtract to/from the intermediate result.	`VFMADDSUB132PD`
`_mm512_fmaddsub_ps`, `_mm512_mask3_fmaddsub_ps`, `_mm512_mask_fmaddsub_ps`, `_mm512_maskz_fmaddsub_ps` `_mm512_fmaddsub_round_ps`, `_mm512_mask3_fmaddsub_round_ps`, `_mm512_mask_fmaddsub_round_ps`, `_mm512_maskz_fmaddsub_round_ps`	Multiplies float32 element vector elements, then alternatively add and subtract to/from the intermediate result.	`VFMADDSUB132PS`
`_mm512_fmsub_pd`, `_mm512_mask3_fmsub_pd`, `_mm512_mask_fmsub_pd`, `_mm512_maskz_fmsub_pd` `_mm512_fmsub_round_pd`, `_mm512_mask3_fmsub_round_pd`, `_mm512_mask_fmsub_round_pd`, `_mm512_maskz_fmsub_round_pd`	Multiplies packed float64 element vector elements, then subtracts the intermediate result to float64 vector elements.	`VFMSUB132PD`
`_mm512_fmsub_ps`, `_mm512_mask3_fmsub_ps`, `_mm512_mask_fmsub_ps`, `_mm512_maskz_fmsub_ps` `_mm512_fmsub_round_ps`, `_mm512_mask3_fmsub_round_ps`, `_mm512_mask_fmsub_round_ps`, `_mm512_maskz_fmsub_round_ps`	Multiplies packed float32 element vector elements, then subtracts the intermediate result to float32 vector elements.	`VFMSUB132PS`
`_mm_mask3_fmsub_sd`, `_mm_mask_fmsub_sd`, `_mm_maskz_fmsub_sd` `_mm_mask3_fmsub_round_sd`, `_mm_mask_fmsub_round_sd`, `_mm_maskz_fmsub_round_sd`	Multiplies scalar float64 element vector elements, then subtracts the intermediate result to float64 vector elements.	`VFMSUB132SD`
`_mm_mask3_fmsub_ss`, `_mm_mask_fmsub_ss`, `_mm_maskz_fmsub_ss` `_mm_mask3_fmsub_round_ss`, `_mm_mask_fmsub_round_ss`, `_mm_maskz_fmsub_round_ss`	Multiplies scalar float32 element vector elements, then subtracts the intermediate result to float32 vector elements.	`VFMSUB132SS`
`_mm512_fmsubadd_pd`, `_mm512_mask3_fmsubadd_pd`, `_mm512_mask_fmsubadd_pd`, `_mm512_maskz_fmsubadd_pd` `_mm512_fmsubadd_round_pd`, `_mm512_mask3_fmsubadd_round_pd`, `_mm512_mask_fmsubadd_round_pd`, `_mm512_maskz_fmsubadd_round_pd`	Multiplies float64 element vector elements, then alternatively subtract and add to/from the intermediate result.	`VFMSUBADD132PD`
`_mm512_fmsubadd_ps`, `_mm512_mask3_fmsubadd_ps`, `_mm512_mask_fmsubadd_ps`, `_mm512_maskz_fmsubadd_ps` `_mm512_fmsubadd_round_ps`, `_mm512_mask3_fmsubadd_round_ps`, `_mm512_mask_fmsubadd_round_ps`, `_mm512_maskz_fmsubadd_round_ps`	Multiplies float32 element vector elements, then alternatively subtract and add to/from the intermediate result.	`VFMSUBADD132PS`
`_mm512_fnmadd_pd`, `_mm512_mask3_fnmadd_pd`, `_mm512_mask_fnmadd_pd`, `_mm512_maskz_fnmadd_pd` `_mm512_fnmadd_round_pd`, `_mm512_mask3_fnmadd_round_pd`, `_mm512_mask_fnmadd_round_pd`, `_mm512_maskz_fnmadd_round_pd`	Multiplies packed float64 element vector elements, then adds the negated intermediate result to float64 vector elements.	`VFNMADD132PD`
`_mm512_fnmadd_ps`, `_mm512_mask3_fnmadd_ps`, `_mm512_maskz_fnmadd_ps`, `_mm512_mask_fnmadd_ps` `_mm512_fnmadd_round_ps`, , `_mm512_mask3_fnmadd_round_ps`, `_mm512_mask_fnmadd_round_ps`, `_mm512_maskz_fnmadd_round_ps`	Multiplies packed float32 element vector elements, then adds the negated intermediate result to float32 vector elements.	`VFNMADD132PS`
`_mm_mask3_fnmadd_round_sd`, `_mm_mask_fnmadd_round_sd`, `_mm_maskz_fnmadd_round_sd` `_mm_maskz_fnmadd_sd`, `_mm_mask_fnmadd_sd`, `_mm_mask3_fnmadd_sd`	Multiplies scalar float64 element vector elements, then adds the negated intermediate result to float64 vector elements.	`VFNMADD132SD`
`_mm_mask3_fnmadd_ss`, `_mm_mask_fnmadd_ss`, `_mm_maskz_fnmadd_ss` `_mm_mask3_fnmadd_round_ss`, `_mm_mask_fnmadd_round_ss`, `_mm_maskz_fnmadd_round_ss`	Multiplies scalar float32 element vector elements, then adds the negated intermediate result to float32 vector elements.	`VFNMADD132SS`
`_mm512_fnmsub_pd`, `_mm512_mask3_fnmsub_pd`, `_mm512_mask_fnmsub_pd`, `_mm512_maskz_fnmsub_pd` `_mm512_fnmsub_round_pd`, `_mm512_mask3_fnmsub_round_pd`, `_mm512_mask_fnmsub_round_pd`, `_mm512_maskz_fnmsub_round_pd`	Multiplies packed float64 element vector elements, then subtracts the negated intermediate result to float64 vector elements.	`VFNMSUB132PD`
`_mm512_fnmsub_ps`, `_mm512_mask3_fnmsub_ps`, `_mm512_maskz_fnmsub_ps`, `_mm512_mask_fnmsub_ps` `_mm512_fnmsub_round_ps`, `_mm512_mask3_fnmsub_round_ps`, `_mm512_maskz_fnmsub_round_ps`, `_mm512_mask_fnmsub_round_ps`	Multiplies packed float32 element vector elements, then subtracts the negated intermediate result to float32 vector elements.	`VFNMSUB132PS`
`_mm_maskz_fnmsub_round_sd`, `_mm_mask_fnmsub_round_sd`, `_mm_mask3_fnmsub_round_sd` `_mm_mask_fnmsub_sd`, `_mm_mask3_fnmsub_sd`, `_mm_maskz_fnmsub_sd`	Multiplies scalar float64 element vector elements, then subtracts the negated intermediate result to float64 vector elements.	`VFNMSUB132SD`
`_mm_maskz_fnmsub_round_ss`, `_mm_mask_fnmsub_round_ss`, `_mm_mask3_fnmsub_round_ss` `_mm_mask_fnmsub_ss`, `_mm_maskz_fnmsub_ss`, `_mm_mask3_fnmsub_ss`	Multiplies scalar float32 element vector elements, then subtracts the negated intermediate result to float32 vector elements.	`VFNMSUB132SS`

variable	definition
`k`	writemask used as a selector
`a`	first source vector element
`b`	second source vector element
`src`	source element to use based on writemask result
`round`	Rounding control values; these can be one of the following (along with the `sae` suppress all exceptions flag): `_MM_FROUND_TO_NEAREST_INT` - rounds to nearest even `_MM_FROUND_TO_NEG_INF` - rounds to negative infinity `_MM_FROUND_TO_POS_INF` - rounds to positive infinity `_MM_FROUND_TO_ZERO` - rounds to zero `_MM_FROUND_CUR_DIRECTION` - rounds using default from MXCSR register

_mm512_fmadd_pd

extern __m512d __cdecl _mm512_fmadd_pd(__m512d a, __m512d b, __m512d c);

Multiplies packed float64 elements in a and b, adds the intermediate result to packed elements in c, and stores the result.

_mm512_mask_fmadd_pd

extern __m512d __cdecl _mm512_mask_fmadd_pd(__m512d a, __mmask8 k, __m512d b, __m512d c);

Multiplies packed float64 elements in a and b, adds the intermediate result to packed elements in c, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).

_mm512_mask3_fmadd_pd

extern __m512d __cdecl _mm512_mask3_fmadd_pd(__m512d a, __m512d b, __m512d c, __mmask8 k);

Multiplies packed float64 elements in a and b, adds the intermediate result to packed elements in c, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).

_mm512_maskz_fmadd_pd

extern __m512d __cdecl _mm512_maskz_fmadd_pd(__mmask8 k, __m512d a, __m512d b, __m512d c);

Multiplies packed float64 elements in a and b, adds the intermediate result to packed elements in c, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm512_fmadd_round_pd

extern __m512d __cdecl _mm512_fmadd_round_pd(__m512d a, __m512d b, __m512d c, int round);

Multiplies packed float64 elements in a and b, adds the intermediate result to packed elements in c, and stores the result.

_mm512_mask_fmadd_round_pd

extern __m512d __cdecl _mm512_mask_fmadd_round_pd(__m512d a, __mmask8 k, __m512d b, __m512d c, int round);

_mm512_mask3_fmadd_round_pd

extern __m512d __cdecl _mm512_mask3_fmadd_round_pd(__m512d a, __m512d b, __m512d c, __mmask8 k, int round);

_mm512_maskz_fmadd_round_pd

extern __m512d __cdecl _mm512_maskz_fmadd_round_pd(__mmask8 k, __m512d a, __m512d b, __m512d c, int round);

_mm512_fmadd_round_ps

extern __m512 __cdecl _mm512_fmadd_round_ps(__m512 a, __m512 b, __m512 c, int round);

Multiplies packed float32 elements in a and b, adds the intermediate result to packed elements in c, and stores the result.

_mm512_mask_fmadd_round_ps

extern __m512 __cdecl _mm512_mask_fmadd_round_ps(__m512 a, __mmask16 k, __m512 b, __m512 c, int round);

Multiplies packed float32 elements in a and b, adds the intermediate result to packed elements in c, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).

_mm512_mask3_fmadd_round_ps

extern __m512 __cdecl _mm512_mask3_fmadd_round_ps(__m512 a, __m512 b, __m512 c, __mmask16 k, int round);

Multiplies packed float32 elements in a and b, adds the intermediate result to packed elements in c, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).

_mm512_maskz_fmadd_round_ps

extern __m512 __cdecl _mm512_maskz_fmadd_round_ps(__mmask16 k, __m512 a, __m512 b, __m512 c, const int round);

Multiplies packed float32 elements in a and b, adds the intermediate result to packed elements in c, and stores the result a using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm512_fmadd_ps

extern __m512 __cdecl _mm512_fmadd_ps(__m512 a, __m512 b, __m512 c);

Multiplies packed float32 elements in a and b, adds the intermediate result to packed elements in c, and stores the result.

_mm512_mask_fmadd_ps

extern __m512 __cdecl _mm512_mask_fmadd_ps(__m512 a, __mmask16 k, __m512 b, __m512 c);

_mm512_mask3_fmadd_ps

extern __m512 __cdecl _mm512_mask3_fmadd_ps(__m512, __m512 b, __m512 c, __mmask16 k);

_mm512_maskz_fmadd_ps

extern __m512 __cdecl _mm512_maskz_fmadd_ps(__mmask16 k, __m512 a, __m512 b, __m512 c);

Multiplies packed float32 elements in a and b, adds the intermediate result to packed elements in c, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm512_fmadd_round_ps

extern __m512 __cdecl _mm512_fmadd_round_ps(__m512 a, __m512 b, __m512 c, int round);

Multiplies packed float32 elements in a and b, adds the intermediate result to packed elements in c, and stores the result.

_mm512_mask_fmadd_round_ps

extern __m512 __cdecl _mm512_mask_fmadd_round_ps(__m512 a, __mmask16 k, __m512 b, __m512 c, int round);

_mm512_mask3_fmadd_round_ps

extern __m512 __cdecl _mm512_mask3_fmadd_round_ps(__m512 a, __m512 b, __m512 c, __mmask16 k, int round);

_mm512_maskz_fmadd_round_ps

extern __m512 __cdecl _mm512_maskz_fmadd_round_ps(__mmask16 k, __m512 a, __m512 b, __m512 c, int round);

_mm_mask_fmadd_sd

extern __m128d __cdecl _mm_mask_fmadd_sd(__m128d a, __mmask8 k, __m128d b, __m128d c);

Multiplies lower float64 elements in a and b, and adds the intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from a when mask bit 0 is not set), and copies upper element from a to upper destination element.

_mm_mask3_fmadd_sd

extern __m128d __cdecl _mm_mask3_fmadd_sd(__m128d a, __m128d b, __m128d c, __mmask8 k);

Multiplies lower float64 elements in a and b, and adds the intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper element from a to upper destination element.

_mm_maskz_fmadd_sd

extern __m128d __cdecl _mm_maskz_fmadd_sd(__mmask8 k, __m128d a, __m128d b, __m128d c);

Multiplies lower float64 elements in a and b, and adds the intermediate result to lower element in c. Stores the result in lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper element from a to upper destination element.

_mm_mask_fmadd_round_sd

extern __m128d __cdecl _mm_mask_fmadd_round_sd(__m128d a, __mmask8 k, __m128d b, __m128d c, int round);

_mm_mask3_fmadd_round_sd

extern __m128d __cdecl _mm_mask3_fmadd_round_sd(__m128d a, __m128d b, __m128d c, __mmask8 k, int round);

Multiplies lower float64 elements in a and b, and adds the intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper element from a to upper destination element.

_mm_maskz_fmadd_round_sd

extern __m128d __cdecl _mm_maskz_fmadd_round_sd(__mmask8 k, __m128d a, __m128d b, __m128d c, int round);

_mm_mask_fmadd_ss

extern __m128 __cdecl _mm_mask_fmadd_ss(__m128 a, __mmask8 k, __m128 b, __m128 c);

Multiplies lower float32 elements in a and b, and adds the intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from a when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.

_mm_mask3_fmadd_ss

extern __m128 __cdecl _mm_mask3_fmadd_ss(__m128 a, __m128 b, __m128 c, __mmask8 k);

Multiplies lower float32 elements in a and b, and adds the intermediate result to lower element in c. Stores the result in lower element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.

_mm_maskz_fmadd_ss

extern __m128 __cdecl _mm_maskz_fmadd_ss(__mmask8 k, __m128 a, __m128 b, __m128 c);

Multiplies lower float32 elements in a and b, and adds the intermediate result to lower element in c. Stores the result in lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.

_mm_mask_fmadd_round_ss

extern __m128 __cdecl _mm_mask_fmadd_round_ss(__m128 a, __mmask8 k, __m128 b, __m128 c, int round);

_mm_mask3_fmadd_round_ss

extern __m128 __cdecl _mm_mask3_fmadd_round_ss(__m128 a, __m128 b, __m128 c, __mmask8 k, int round);

Multiplies lower float32 elements in a and b, and adds the intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.

_mm_maskz_fmadd_round_ss

extern __m128 __cdecl _mm_maskz_fmadd_round_ss(__mmask8 k, __m128 a, __m128 b, __m128 c, int round);

_mm512_fmaddsub_pd

extern __m512d __cdecl _mm512_fmaddsub_pd(__m512d a, __m512d b, __m512d c);

Multiplies packed float64 elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and stores the result.

_mm512_mask_fmaddsub_pd

extern __m512d __cdecl _mm512_mask_fmaddsub_pd(__m512d, __mmask8 k, __m512d b, __m512d c);

Multiplies packed float64 elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).

_mm512_mask3_fmaddsub_pd

extern __m512d __cdecl _mm512_mask3_fmaddsub_pd(__m512d a, __m512d k, __m512d b, __mmask8 c);

_mm512_maskz_fmaddsub_pd

extern __m512d __cdecl _mm512_maskz_fmaddsub_pd(__mmask8 k, __m512d a, __m512d b, __m512d c);

Multiplies packed float64 elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm512_fmaddsub_round_pd

extern __m512d __cdecl _mm512_fmsubadd_round_pd(__m512d a, __m512d b, __m512d c, int round);

Multiplies packed float64 elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and stores the result.

_mm512_mask_fmaddsub_round_pd

extern __m512d __cdecl _mm512_mask_fmsubadd_round_pd(__m512d a, __mmask8 k, __m512d b, __m512d c, int round);

_mm512_mask3_fmaddsub_round_pd

extern __m512d __cdecl _mm512_mask3_fmsubadd_round_pd(__m512d a, __m512d b, __m512d c, __mmask8 k, int round);

_mm512_maskz_fmaddsub_round_pd

extern __m512d __cdecl _mm512_maskz_fmsubadd_round_pd(__mmask8 k, __m512d a, __m512d b, __m512d c, int round);

_mm512_fmaddsub_ps

extern __m512 __cdecl _mm512_fmaddsub_ps(__m512 a, __m512 b, __m512 c);

Multiplies packed float32 elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and stores the result.

_mm512_mask_fmaddsub_ps

extern __m512 __cdecl _mm512_mask_fmaddsub_ps(__m512 a, __mmask16 k, __m512 b, __m512 c);

Multiplies packed float32 elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).

_mm512_mask3_fmaddsub_ps

extern __m512 __cdecl _mm512_mask3_fmaddsub_ps(__m512 a, __m512 b, __m512 c, __mmask16 k);

_mm512_maskz_fmaddsub_ps

extern __m512 __cdecl _mm512_maskz_fmaddsub_ps(__mmask16 k, __m512 a, __m512 b, __m512 c);

Multiplies packed float32 elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm512_fmaddsub_round_ps

extern __m512 __cdecl _mm512_fmaddsub_round_ps(__m512 a, __m512 b, __m512 c, int round);

Multiplies packed float32 elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and stores the result.

_mm512_mask_fmaddsub_round_ps

extern __m512 __cdecl _mm512_mask_fmaddsub_round_ps(__m512 a, __mmask16 k, __m512 b, __m512 c, int round);

_mm512_mask3_fmaddsub_round_ps

extern __m512 __cdecl _mm512_mask3_fmaddsub_round_ps(__m512 a, __m512 b, __m512 c, __mmask16 k, int round);

_mm512_maskz_fmaddsub_round_ps

extern __m512 __cdecl _mm512_maskz_fmaddsub_round_ps(__mmask16 k, __m512 a, __m512 b, __m512 c, int round);

_mm512_fmsub_pd

extern __m512d __cdecl _mm512_fmsub_pd(__m512d a, __m512d b, __m512d c);

Multiplies packed float64 elements in a and b, subtracts packed elements in c from the intermediate result, and stores the result.

_mm512_mask_fmsub_pd

extern __m512d __cdecl _mm512_mask_fmsub_pd(__m512d a, __mmask8 k, __m512d b, __m512d c);

Multiplies packed float64 elements in a and b, subtracts packed elements in c from the intermediate result, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).

_mm512_mask3_fmsub_pd

extern __m512d __cdecl _mm512_mask3_fmsub_pd(__m512d a, __m512d b, __m512d c, __mmask8 k);

_mm512_maskz_fmsub_pd

extern __m512d __cdecl _mm512_maskz_fmsub_pd(__mmask8 k, __m512d a, __m512d b, __m512d c);

Multiplies packed float64 elements in a and b, subtracts packed elements in c from the intermediate result, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm512_fmsub_round_pd

extern __m512d __cdecl _mm512_fmsub_round_pd(__m512d a, __m512d b, __m512d c, int round);

Multiplies packed float64 elements in a and b, subtracts packed elements in c from the intermediate result, and stores the result.

_mm512_mask_fmsub_round_pd

extern __m512d __cdecl _mm512_mask_fmsub_round_pd(__m512d a, __mmask8 k, __m512d b, __m512d c, int round);

_mm512_mask3_fmsub_round_pd

extern __m512d __cdecl _mm512_mask3_fmsub_round_pd(__m512d a, __m512d b, __m512d c, __mmask8 k, int round);

_mm512_maskz_fmsub_round_pd

extern __m512d __cdecl _mm512_maskz_fmsub_round_pd(__mmask8 k, __m512d a, __m512d b, __m512d c, int round);

_mm512_fmsub_ps

extern __m512 __cdecl _mm512_fmsub_ps(__m512 a, __m512 b, __m512 c);

Multiplies packed float32 elements in a and b, subtracts packed elements in c from the intermediate result, and stores the result.

_mm512_mask_fmsub_ps

extern __m512 __cdecl _mm512_mask_fmsub_ps(__m512 a, __mmask16 k, __m512 b, __m512 c);

Multiplies packed float32 elements in a and b, subtracts packed elements in c from the intermediate result, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).

_mm512_mask3_fmsub_ps

extern __m512 __cdecl _mm512_mask3_fmsub_ps(__m512 a, __m512 b, __m512 c, __mmask16 k);

_mm512_maskz_fmsub_ps

extern __m512 __cdecl _mm512_maskz_fmsub_ps(__mmask16 k, __m512 a, __m512 b, __m512 c);

Multiplies packed float32 elements in a and b, subtracts packed elements in c from the intermediate result, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm512_fmsub_round_ps

extern __m512 __cdecl _mm512_fmsub_round_ps(__m512 a, __m512 b, __m512 c, int round);

Multiplies packed float32 elements in a and b, subtracts packed elements in c from the intermediate result, and stores the result.

_mm512_mask_fmsub_round_ps

extern __m512 __cdecl _mm512_mask_fmsub_round_ps(__m512 a, __mmask16 k, __m512 b, __m512 c, int round);

_mm512_mask3_fmsub_round_ps

extern __m512 __cdecl _mm512_mask3_fmsub_round_ps(__m512 a, __m512 b, __m512 c, __mmask16 k, int round);

_mm512_maskz_fmsub_round_ps

extern __m512 __cdecl _mm512_maskz_fmsub_round_ps(__mmask16 k, __m512 a, __m512 b, __m512 c, int round);

_mm_mask_fmsub_sd

extern __m128d __cdecl _mm_mask_fmsub_sd(__m128d a, __mmask8 k, __m128d b, __m128d c);

Multiplies lower float64 elements in a and b, and subtracts lower element in c from the intermediate result. Stores the result in lower destination element using writemask k (the element is copied from a when mask bit 0 is not set), and copies upper element from a to upper destination element.

_mm_mask3_fmsub_sd

extern __m128d __cdecl _mm_mask3_fmsub_sd(__m128d a, __m128d b, __m128d c, __mmask8 k);

Multiplies lower float64 elements in a and b, and subtracts lower element in c from the intermediate result. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper element from a to upper destination element.

_mm_maskz_fmsub_sd

extern __m128d __cdecl _mm_maskz_fmsub_sd(__mmask8 k, __m128d a, __m128d b, __m128d c);

Multiplies lower float64 elements in a and b, and subtracts lower element in c from the intermediate result. Stores the result in lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper element from a to upper destination element.

_mm_mask_fmsub_round_sd

extern __m128d __cdecl _mm_mask_fmsub_round_sd(__m128d a, __mmask8 k, __m128d b, __m128d c, int round);

_mm_mask3_fmsub_round_sd

extern __m128d __cdecl _mm_mask3_fmsub_round_sd(__m128d a, __m128d b, __m128d c, __mmask8 k, int round);

Multiplies lower float64 elements in a and b, and subtracts lower element in c from the intermediate result. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper element from a to upper destination element.

_mm_maskz_fmsub_round_sd

extern __m128d __cdecl _mm_maskz_fmsub_round_sd(__mmask8 k, __m128d a, __m128d b, __m128d c, int round);

_mm_mask_fmsub_ss

extern __m128 __cdecl _mm_mask_fmsub_ss(__m128 a, __mmask8 k, __m128 b, __m128 c);

Multiplies lower float32 elements in a and b, and subtracts lower element in c from the intermediate result. Stores the result in lower destination element using writemask k (the element is copied from a when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.

_mm_mask3_fmsub_ss

extern __m128 __cdecl _mm_mask3_fmsub_ss(__m128 a, __m128 b, __m128 c, __mmask8 k);

Multiplies lower float32 elements in a and b, and subtracts lower element in c from the intermediate result. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.

_mm_maskz_fmsub_ss

extern __m128 __cdecl _mm_maskz_fmsub_ss(__mmask8 k, __m128 a, __m128 b, __m128 c);

Multiplies lower float32 elements in a and b, and subtracts lower element in c from the intermediate result. Stores the result in lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.

_mm_mask_fmsub_round_ss

extern __m128 __cdecl _mm_mask_fmsub_round_ss(__m128 a, __mmask8 k, __m128 b, __m128 c, int round);

_mm_mask3_fmsub_round_ss

extern __m128 __cdecl _mm_mask3_fmsub_round_ss(__m128 a, __m128 b, __m128 c, __mmask8 k, int round);

Multiplies lower float32 elements in a and b, and subtracts lower element in c from the intermediate result. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.

_mm_maskz_fmsub_round_ss

extern __m128 __cdecl _mm_maskz_fmsub_round_ss(__mmask8 k, __m128 a, __m128 b, __m128 c, int round);

_mm512_fmsubadd_pd

extern __m512d __cdecl _mm512_fmsubadd_pd(__m512d a, __m512d b, __m512d c);

Multiplies packed float64 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result.

_mm512_mask_fmsubadd_pd

extern __m512d __cdecl _mm512_mask_fmsubadd_pd(__m512d a, __mmask8 k, __m512d b, __m512d c);

Multiplies packed float64 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).

_mm512_mask3_fmsubadd_pd

extern __m512d __cdecl _mm512_mask3_fmsubadd_pd(__m512d a, __m512d b, __m512d c, __mmask8 k);

_mm512_maskz_fmsubadd_pd

extern __m512d __cdecl _mm512_maskz_fmsubadd_pd(__mmask8 k, __m512d a, __m512d b, __m512d c);

Multiplies packed float64 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result destination using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm512_fmsubadd_round_pd

extern __m512d __cdecl _mm512_fmaddsub_round_pd(__m512d a, __m512d b, __m512d c, int round);

Multiplies packed float64 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result.

_mm512_mask_fmsubadd_round_pd

extern __m512d __cdecl _mm512_mask_fmaddsub_round_pd(__m512d a, __mmask8 k, __m512d b, __m512d c, int round);

_mm512_mask3_fmsubadd_round_pd

extern __m512d __cdecl _mm512_mask3_fmaddsub_round_pd(__m512d a, __m512d b, __m512d c, __mmask8 k, int round);

_mm512_maskz_fmsubadd_round_pd

extern __m512d __cdecl _mm512_maskz_fmaddsub_round_pd(__mmask8 k, __m512d a, __m512d b, __m512d c, int round);

Multiplies packed float64 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm512_fmsubadd_ps

extern __m512 __cdecl _mm512_fmsubadd_ps(__m512 a, __m512 b, __m512 c);

Multiplies packed float32 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result.

_mm512_mask_fmsubadd_ps

extern __m512 __cdecl _mm512_mask_fmsubadd_ps(__m512 a, __mmask16 k, __m512 b, __m512 c);

Multiplies packed float32 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).

_mm512_mask3_fmsubadd_ps

extern __m512 __cdecl _mm512_mask3_fmsubadd_ps(__m512 a, __m512 b, __m512 c, __mmask16 k);

_mm512_maskz_fmsubadd_ps

extern __m512 __cdecl _mm512_maskz_fmsubadd_ps(__mmask16 k, __m512 a, __m512 b, __m512 c);

Multiplies packed float32 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm512_fmsubadd_round_ps

extern __m512 __cdecl _mm512_fmsubadd_round_ps(__m512 a, __m512 b, __m512 c, int round);

Multiplies packed float32 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result.

_mm512_mask_fmsubadd_round_ps

extern __m512 __cdecl _mm512_mask_fmsubadd_round_ps(__m512 a, __mmask16 k, __m512 b, __m512 c, int round);

_mm512_mask3_fmsubadd_round_ps

extern __m512 __cdecl _mm512_mask3_fmsubadd_round_ps(__m512 a, __m512 b, __m512 c, __mmask16 k, int round);

_mm512_maskz_fmsubadd_round_ps

extern __m512 __cdecl _mm512_maskz_fmsubadd_round_ps(__mmask16 k, __m512 a, __m512 b, __m512 c, int round);

_mm512_fnmadd_pd

extern __m512d __cdecl _mm512_fnmadd_pd(__m512d a, __m512d b, __m512d c);

Multiplies packed float64 elements in a and b, adds the negated intermediate result to packed elements in c, and stores the result.

_mm512_mask_fnmadd_pd

extern __m512d __cdecl _mm512_mask_fnmadd_pd(__m512d a, __mmask8 k, __m512d b, __m512d c);

Multiplies packed float64 elements in a and b, adds the negated intermediate result to packed elements in c, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).

_mm512_mask3_fnmadd_pd

extern __m512d __cdecl _mm512_mask3_fnmadd_pd(__m512d a, __m512d b, __m512d c, __mmask8 k);

_mm512_maskz_fnmadd_pd

extern __m512d __cdecl _mm512_maskz_fnmadd_pd(__mmask8 k, __m512d a, __m512d b, __m512d c);

Multiplies packed float64 elements in a and b, adds the negated intermediate result to packed elements in c, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm512_fnmadd_round_pd

extern __m512d __cdecl _mm512_fnmadd_round_pd(__m512d a, __m512d b, __m512d c, int round);

Multiplies packed float64 elements in a and b, adds the negated intermediate result to packed elements in c, and stores the result.

_mm512_mask_fnmadd_round_pd

extern __m512d __cdecl _mm512_mask_fnmadd_round_pd(__m512d a, __mmask8 k, __m512d b, __m512d c, int round);

_mm512_mask3_fnmadd_round_pd

extern __m512d __cdecl _mm512_mask3_fnmadd_round_pd(__m512d a, __m512d b, __m512d c, __mmask8 k, int round);

_mm512_maskz_fnmadd_round_pd

extern __m512d __cdecl _mm512_maskz_fnmadd_round_pd(__mmask8 k, __m512d a, __m512d b, __m512d c, int round);

_mm512_fnmadd_ps

extern __m512 __cdecl _mm512_fnmadd_ps(__m512 a, __m512 b, __m512 c);

Multiplies packed float32 elements in a and b, adds the negated intermediate result to packed elements in c, and stores the result.

_mm512_mask_fnmadd_ps

extern __m512 __cdecl _mm512_mask_fnmadd_ps(__m512 a, __mmask16 k, __m512 b, __m512 c);

Multiplies packed float32 elements in a and b, adds the negated intermediate result to packed elements in c, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).

_mm512_mask3_fnmadd_ps

extern __m512 __cdecl _mm512_mask3_fnmadd_ps(__m512 a, __m512 b, __m512 c, __mmask16 k);

_mm512_maskz_fnmadd_ps

extern __m512 __cdecl _mm512_maskz_fnmadd_ps(__mmask16 k, __m512 a, __m512 b, __m512 c);

Multiplies packed float32 elements in a and b, adds the negated intermediate result to packed elements in c, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm512_fnmadd_round_ps

extern __m512 __cdecl _mm512_fnmadd_round_ps(__m512 a, __m512 b, __m512 c, int round);

Multiplies packed float32 elements in a and b, adds the negated intermediate result to packed elements in c, and stores the result.

_mm512_mask_fnmadd_round_ps

extern __m512 __cdecl _mm512_mask_fnmadd_round_ps(__m512 a, __mmask16 k, __m512 b, __m512 c, int round);

_mm512_mask3_fnmadd_round_ps

extern __m512 __cdecl _mm512_mask3_fnmadd_round_ps(__m512 a, __m512 b, __m512 c, __mmask16 k, int round);

_mm512_maskz_fnmadd_round_ps

extern __m512 __cdecl _mm512_maskz_fnmadd_round_ps(__mmask16 k, __m512 a, __m512 b, __m512 c, int round);

_mm_mask_fnmadd_sd

extern __m128d __cdecl _mm_mask_fnmadd_sd(__m128d a, __mmask8 k, __m128d b, __m128d c);

Multiplies lower float64 elements in a and b, and adds the negated intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from a when mask bit 0 is not set), and copies upper element from a to upper destination element.

_mm_mask3_fnmadd_sd

extern __m128d __cdecl _mm_mask3_fnmadd_sd(__m128d a, __m128d b, __m128d c, __mmask8 k);

Multiplies lower float64 elements in a and b, and adds the negated intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper element from a to upper destination element.

_mm_maskz_fnmadd_sd

extern __m128d __cdecl _mm_maskz_fnmadd_sd(__mmask8 k, __m128d a, __m128d b, __m128d c);

Multiplies lower float64 elements in a and b, and adds the negated intermediate result to lower element in c. Stores the result in lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper element from a to upper destination element.

_mm_mask_fnmadd_round_sd

extern __m128d __cdecl _mm_mask_fnmadd_round_sd(__m128d a, __mmask8 k, __m128d b, __m128d c, int round);

_mm_mask3_fnmadd_round_sd

extern __m128d __cdecl _mm_mask3_fnmadd_round_sd(__m128d a, __m128d b, __m128d c, __mmask8 k, int round);

Multiplies lower float64 elements in a and b, and adds the negated intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper element from a to upper destination element.

_mm_maskz_fnmadd_round_sd

extern __m128d __cdecl _mm_maskz_fnmadd_round_sd(__mmask8 k, __m128d a, __m128d b, __m128d c, int round);

_mm_mask_fnmadd_ss

extern __m128 __cdecl _mm_mask_fnmadd_ss(__m128 a, __mmask8 k, __m128 b, __m128 c);

Multiplies lower float32 elements in a and b, and adds the negated intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from a when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.

_mm_mask3_fnmadd_ss

extern __m128 __cdecl _mm_mask3_fnmadd_ss(__m128 a, __m128 b, __m128 c, __mmask8 k);

Multiplies lower float32 elements in a and b, and adds the negated intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.

_mm_maskz_fnmadd_ss

extern __m128 __cdecl _mm_maskz_fnmadd_ss(__mmask8 k, __m128 a, __m128 b, __m128 c);

Multiplies lower float32 elements in a and b, and adds the negated intermediate result to lower element in c. Stores the result in lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.

_mm_mask_fnmadd_round_ss

extern __m128 __cdecl _mm_mask_fnmadd_round_ss(__m128 a, __mmask8 k, __m128 b, __m128 c, int round);

_mm_mask3_fnmadd_round_ss

extern __m128 __cdecl _mm_mask3_fnmadd_round_ss(__m128 a, __m128 b, __m128 c, __mmask8 k, int round);

Multiplies lower float32 elements in a and b, and adds the negated intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.

_mm_maskz_fnmadd_round_ss

extern __m128 __cdecl _mm_maskz_fnmadd_round_ss(__mmask8 k, __m128 a, __m128 b, __m128 c, int round);

_mm512_fnmsub_pd

extern __m512d __cdecl _mm512_fnmsub_pd(__m512d a, __m512d b, __m512d c);

Multiplies packed float64 elements in a and b, subtracts packed elements in c from the negated intermediate result, and stores the result.

_mm512_mask_fnmsub_pd

extern __m512d __cdecl _mm512_mask_fnmsub_pd(__m512d a, __mmask8 k, __m512d b, __m512d c);

Multiplies packed float64 elements in a and b, subtracts packed elements in c from the negated intermediate result, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).

_mm512_mask3_fnmsub_pd

extern __m512d __cdecl _mm512_mask3_fnmsub_pd(__m512d a, __m512d b, __m512d c, __mmask8 k);

_mm512_maskz_fnmsub_pd

extern __m512d __cdecl _mm512_maskz_fnmsub_pd(__mmask8 k, __m512d a, __m512d b, __m512d c);

Multiplies packed float64 elements in a and b, subtracts packed elements in c from the negated intermediate result, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm512_fnmsub_round_pd

extern __m512d __cdecl _mm512_fnmsub_round_pd(__m512d a, __m512d b, __m512d c, int round);

Multiplies packed float64 elements in a and b, subtracts packed elements in c from the negated intermediate result, and stores the result.

_mm512_mask_fnmsub_round_pd

extern __m512d __cdecl _mm512_mask_fnmsub_round_pd(__m512d a, __mmask8 k, __m512d b, __m512d c, int round);

_mm512_mask3_fnmsub_round_pd

extern __m512d __cdecl _mm512_mask3_fnmsub_round_pd(__m512d a, __m512d b, __m512d c, __mmask8 k, int round);

_mm512_maskz_fnmsub_round_pd

extern __m512d __cdecl _mm512_maskz_fnmsub_round_pd(__mmask8 k, __m512d a, __m512d b, __m512d c, int round);

_mm512_fnmsub_ps

extern __m512 __cdecl _mm512_fnmsub_ps(__m512 a, __m512 b, __m512 c);

Multiplies packed float32 elements in a and b, subtracts packed elements in c from the negated intermediate result, and stores the result.

_mm512_mask_fnmsub_ps

extern __m512 __cdecl _mm512_mask_fnmsub_ps(__m512 a, __mmask16 k, __m512 b, __m512 c);

Multiplies packed float32 elements in a and b, subtracts packed elements in c from the negated intermediate result, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).

_mm512_mask3_fnmsub_ps

extern __m512 __cdecl _mm512_mask3_fnmsub_ps(__m512 a, __m512 b, __m512 c, __mmask16 k);

_mm512_maskz_fnmsub_ps

extern __m512 __cdecl _mm512_maskz_fnmsub_ps(__mmask16 k, __m512 a, __m512 b, __m512 c);

Multiplies packed float32 elements in a and b, subtracts packed elements in c from the negated intermediate result, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm512_fnmsub_round_ps

extern __m512 __cdecl _mm512_fnmsub_round_ps(__m512 a, __m512 b, __m512 c, int round);

Multiplies packed float32 elements in a and b, subtracts packed elements in c from the negated intermediate result, and stores the result.

_mm512_mask_fnmsub_round_ps

extern __m512 __cdecl _mm512_mask_fnmsub_round_ps(__m512 c, __mmask16 k, __m512 a, __m512 b, int round);

_mm512_mask3_fnmsub_round_ps

extern __m512 __cdecl _mm512_mask3_fnmsub_round_ps(__m512 a, __m512 b, __m512 c, __mmask16 k, int round);

_mm512_maskz_fnmsub_round_ps

extern __m512 __cdecl _mm512_maskz_fnmsub_round_ps(__mmask16 k, __m512 a, __m512 b, __m512 c, int round);

_mm_mask_fnmsub_sd

extern __m128d __cdecl _mm_mask_fnmsub_sd(__m128d c, __mmask8 k, __m128d a, __m128d b);

Multiplies lower float64 elements in a and b, and subtracts lower element in c from the negated intermediate result. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper element from a to upper destination element.

_mm_mask3_fnmsub_sd

extern __m128d __cdecl _mm_mask3_fnmsub_sd(__m128d a, __m128d b, __m128d c, __mmask8 k);

_mm_maskz_fnmsub_sd

extern __m128d __cdecl _mm_maskz_fnmsub_sd(__mmask8 k, __m128d a, __m128d b, __m128d c);

Multiplies lower float64 elements in a and b, and subtracts lower element in c from the negated intermediate result. Stores the result lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper element from a to upper destination element.

_mm_mask_fnmsub_ss

extern __m128 __cdecl _mm_mask_fnmsub_ss(__m128 c, __mmask8 k, __m128 a, __m128 b);

Multiplies lower float32 elements in a and b, and subtracts lower element in c from the negated intermediate result. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.

_mm_mask3_fnmsub_ss

extern __m128 __cdecl _mm_mask3_fnmsub_ss(__m128 a, __m128 b, __m128 c, __mmask8 k);

Multiplies lower float32 elements in a and b, and subtracts lower element in c from the negated intermediate result. Stores the result in lower destination element, and copies upper element from a to upper destination element using writemask k (elements are copied from c when the corresponding mask bit is not set).

_mm_maskz_fnmsub_ss

extern __m128 __cdecl _mm_maskz_fnmsub_ss(__mmask8 k, __m128 a, __m128 b, __m128 c);

Multiplies lower float32 elements in a and b, and subtracts lower element in c from the negated intermediate result. Stores the result in lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.

_mm_mask_fnmsub_round_ss

extern __m128 __cdecl _mm_mask_fnmsub_round_ss(__m128 c, __mmask8 k, __m128 a, __m128 b, int round);

_mm_mask3_fnmsub_round_ss

extern __m128 __cdecl _mm_mask3_fnmsub_round_ss(__m128 a, __m128 b, __m128 c, __mmask8 k, int round);

Multiplies lower float32 elements in a and b, subtract lower element in c from the negated intermediate result, Stores the result in lower destination element, and copies upper element from a to upper destination element using writemask k (elements are copied from c when the corresponding mask bit is not set).

_mm_maskz_fnmsub_round_ss

extern __m128 __cdecl _mm_maskz_fnmsub_round_ss(__mmask8 k, __m128 a, __m128 b, __m128 c, int round);

_mm_mask_fnmsub_round_sd

extern __m128d __cdecl _mm_mask_fnmsub_round_sd(__m128d c, __mmask8 k, __m128d a, __m128d b, int round);

_mm_mask3_fnmsub_round_sd

extern __m128d __cdecl _mm_mask3_fnmsub_round_sd(__m128d a, __m128d b, __m128d c, __mmask8 k, int round);

_mm_maskz_fnmsub_round_sd

extern __m128d __cdecl _mm_maskz_fnmsub_round_sd(__mmask8 k, __m128d a, __m128d b, __m128d c, int round);

_mm_mask_fnmsub_ss

extern __m128 __cdecl _mm_mask_fnmsub_ss(__m128 a, __mmask8 k, __m128 b, __m128 c);

_mm_mask3_fnmsub_ss

extern __m128 __cdecl _mm_mask3_fnmsub_ss(__m128 a, __m128 b, __m128 c, __mmask8 k);

_mm_maskz_fnmsub_ss

extern __m128 __cdecl _mm_maskz_fnmsub_ss(__mmask8 k, __m128 a, __m128 b, __m128 c);

Parent topic: Intrinsics for Arithmetic Operations

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® C++ Compiler Classic Developer Guide and Reference

Intrinsics for FP Fused Multiply-Add (FMA) Operations