2.2.6. Exception Handling for Floating-point Arithmetic

Table 9. Supported Exception Flags
Floating-point Format	Exception Flags	Width	Description
Single precision	Multiplication
	`fp32_mult_overflow`	1	This signal indicates if the multiplier result is a larger value than the maximum presentable value. 1: If the multiplier result is a larger value than the maximum representable value and the result is cast to infinity. 0: If the multiplier result is not larger than the maximum presentable value. This signal is not available in Adder or Subtract Mode.
	`fp32_mult_underflow`	1	This signal indicates if the multiplier result is a smaller value than the minimum presentable value. 1: If the multiplier result is a smaller value than the minimum representable non-zero absolute value and the result is flushed to zero. 0: If the multiplier result is a larger than the minimum representable value. This signal is not available in Adder or Subtract Mode.
	`fp32_mult_inexact`	1	This signal indicates if the multiplier result is not accurately represented. 1: If the multiplier result is: a rounded value a smaller value than the minimum representable value or a larger value than the maximum representable value. 0: If the multiplier result does not meet any of the criteria above. This signal is not available in Adder or Subtract Mode.
	`fp32_mult_invalid`	1	This signal indicates if the multiplier operation is ill-defined and produces an invalid result. 1: If the multiplier result is invalid and cast to qNaN. 0: If the multiplier result is not an invalid number. This signal is not available in Adder or Subtract Mode.
	Addition
	`fp32_adder_overflow`	1	This signal indicates if the adder result is a larger value than the maximum representable value. 1: If the adder result is a larger value than the maximum presentable value and the result is cast to infinity. 0: If the adder result is not larger than the maximum presentable value. This signal is not available in Multiplication Mode.
	`fp32_adder_underflow`	1	This signal indicates if the adder result is a smaller value than the minimum presentable value. 1: If the adder result is a smaller value than the minimum representable non-zero absolute value and the result is flushed to zero. 0: If the adder result is a larger than the minimum representable value. This signal is not available in Multiplication Mode.
	`fp32_adder_inexact`	1	This signal indicates if the adder result is not accurately represented. 1: If the adder result is: a rounded value a smaller value than the minimum representable value or a larger value than the maximum representable value. 0: If the adder result does not meet any of the criteria above. This signal is not available in Multiplication Mode.
	`fp32_adder_invalid`	1	This signal indicates if the adder operation is ill-defined and produces an invalid result. 1: If the adder result is invalid and cast to qNaN. 0: If the adder result is not an invalid number. This signal is not available in Multiplication Mode.
Half precision	Multiplication
	`fp16_mult_top_overflow` `fp16_mult_bot_overflow`	1	This signal indicates if the top or bottom multiplier result is a larger value than the maximum presentable value. 1: If the multiplier result is a larger value than the maximum representable value and the result is cast to infinity. 0: If the multiplier result is smaller than the maximum presentable value. This signal is not available in Adder or Subtract Mode and Extended format.
	`fp16_mult_top_underflow` `fp16_mult_bot_underflow`	1	This signal indicates if the top or bottom multiplier result is a smaller value than the minimum presentable value. 1: If the multiplier result is a smaller value than the minimum representable value and the result is flushed to zero. 0: If the multiplier result is a larger than the minimum representable value. This signal is not available in Adder or Subtract Mode and Extended format.
	`fp16_mult_top_inexact` `fp16_mult_bot_inexact`	1	This signal indicates if the top or bottom multiplier result is an exact representation. 1: If the multiplier result is: a rounded value a smaller value than the minimum representable value or a larger value than the maximum representable value. 0: If the multiplier result does not meet any of the criteria above. This signal is not available in Adder or Subtract Mode.
	`fp16_mult_top_invalid` `fp16_mult_bot_invalid`	1	This signal indicates if the multiplier operation is ill-defined and produces an invalid result. 1: If the multiplier result is invalid and cast to qNaN. 0: If the multiplier result is not an invalid number. This signal is not available in Adder or Subtract Mode.
	`fp16_mult_top_infinite` `fp16_mult_bot_infinite`	1	This signal indicates if the top or bottom multiplier result is a positive or negative infinity. 1: If the result is infinite 0: If the result is normalized float or in the appropriate infinity range This signal is only available for Extended format.
	`fp16_mult_top_zero` `fp16_mult_bot_zero`	1	This signal indicates if the top or bottom multiplier result is a positive or negative zero. 1: If the result is zero 0: If the result is not a zero This signal is only available for Extended format.
	Addition
	`fp16_adder_overflow`	1	This signal indicates if the adder result is a larger value than the maximum representable value. 1: If the adder result is a larger value than the maximum presentable value and the result is cast to infinity. 0: If the adder result is not larger than the maximum presentable value. This signal is not available in Multiplication Mode Extended format.
	`fp16_adder_underflow`	1	This signal indicates if the adder result is a smaller value than the minimum presentable value. 1: If the adder result is a smaller value than the minimum representable value and the result is flushed to zero. 0: If the adder result is a larger than the minimum representable value. This signal is not available in Multiplication Mode Extended format.
	`fp16_adder_inexact`	1	This signal indicates if the adder result is an exact representation. 1: If the adder result is: a rounded value a smaller value than the minimum representable value or a larger value than the maximum representable value. 0: If the adder result does not meet any of the criteria above. This signal is not available in Multiplication Mode.
	`fp16_adder_invalid`	1	This signal indicates if the adder operation is ill-defined and produces an invalid result. 1: If the adder result is invalid and cast to qNaN. 0: If the adder result is not an invalid number. This signal is not available in Multiplication Mode.
	`fp16_adder_infinite`	1	This signal indicates if the adder result is a positive or negative infinity. 1: If the result is infinite 0: If the result is normalized float or in the appropriate infinity range This signal is only available for Extended format.
	`fp16_adder_zero`	1	This signal indicates if the adder result is a positive or negative zero. 1: If the result is zero 0: If the result is not a zero This signal is only available for Extended format.

Table 10. Multiplier Exception Handling Possible Results for FP32 Multiplication, FP16 Flushed, and FP16 Bfloat16 Modes
Input A	Input B	Result	⁴ Flags Overflow/Underflow/Inexact/Invalid
Normalized	Normalized	Normalized value	0/0/0/0
		Normalized (rounded) value	0/0/1/0
		Positive/negative infinity value	1/0/1/0
		Subnormal (denormal) value	0/1/1/0
0 or Subnormal (denormal)	Normalized	0 value	0/0/0/0
Positive/negative infinity	Normalized	Positive/negative infinity value	0/0/0/0
Quiet Not A Number (qNaN)	Normalized	qNaN value	0/0/0/0
0 or Subnormal (denormal)	0 or Subnormal (denormal)	0 value	0/0/0/0
Positive/negative infinity	0 or Subnormal (denormal)	qNaN value	0/0/0/1
Quiet Not A Number (qNaN)	0 or Subnormal (denormal)	qNaN value	0/0/0/0
Positive/negative infinity	Positive/negative Infinity	Positive/negative infinity value	0/0/0/0
Quiet Not A Number (qNaN)	Positive/negative Infinity	qNaN value	0/0/0/0
Quiet Not A Number (qNaN)	Quiet Not A Number (qNaN)	qNaN value	0/0/0/0

Table 11. Adder Exception Handling Possible Results for FP32 Addition/Subtraction, FP16 Flushed, and FP16 Bfloat16 Modes
Input A	Input B	Result :	⁴ Flags Overflow/Underflow/Inexact/Invalid
Normalized	Normalized	Normalized value	0/0/0/0
		Normalized (rounded) value	0/0/1/0
		Positive/negative infinity value	1/0/1/0
		0 value Sign bit = 0	0/0/0/0
		Subnormal (denormal) value The sign is preserved	0/1/1/0
0 or Subnormal (denormal)	Normalized	Input b	0/0/0/0
Positive/negative infinity	Normalized	Positive/negative infinity value	0/0/0/0
Quiet Not A Number (qNaN)	Normalized	qNaN value	0/0/0/0
0 or Subnormal (denormal)	0 or Subnormal (denormal)	0 value For (-0 + (-0)) equation, sign bit = 1. For any other equation, sign bit = 0.	0/0/0/0
Positive/negative infinity	0 or Subnormal (denormal)	Positive/negative infinity value	0/0/0/0
Quiet Not A Number (qNaN)	0 or Subnormal (denormal)	qNaN value	0/0/0/0
Positive/negative infinity	Positive/negative infinity	qNaN value for invalid cases Positive/negative infinity value for valid cases	0/0/0/1 for invalid cases 0/0/0/0 for valid cases Valid cases are: Positive infinity value + positive infinity value Negative infinity value + negative infinity value Negative infinity value - positive infinity value Positive infinity value - negative infinity value
Quiet Not A Number (qNaN)	Positive/negative infinity	qNaN value	0/0/0/0
Quiet Not A Number (qNaN)	Quiet Not A Number (qNaN)	qNaN value	0/0/0/0

Table 12. Multiplication Exception Handling Possible Results for FP16 Extended Modes
Input A	Input B	Result:	⁴ Flags Infinite/Zero/Inexact/Invalid
Normalized/Subnormalized	Normalized/Subnormalized	Normalized/Subnormalized	0/0/x/0
0 value	Normalized/Subnormalized	0 value	0/1/0/0
Positive/negative infinity	Normalized/Subnormalized	Positive/negative infinity value	1/0/0/0
Quiet Not A Number (qNaN)	Normalized/Subnormalized	qNaN value	0/0/0/1 Mantissa = {100...00}
0 value	0 value	0 value	0/1/0/0
Positive/negative infinity	0 value	qNaN value	0/0/0/1 Mantissa = {100...00}
Quiet Not A Number (qNaN)	0 value	qNaN value	0/0/0/1 Mantissa = {100...00}
Positive/negative infinity	Positive/negative infinity	Positive/negative infinity value	1/0/0/0
Quiet Not A Number (qNaN)	Positive/negative infinity	qNaN value	0/0/0/1 Mantissa = {100...00}
Quiet Not A Number (qNaN)	Quiet Not A Number (qNaN)	qNaN value	0/0/0/1 Mantissa = {100...00}

Table 13. Addition Exception Handling Possible Results for FP16 Extended Modes
Input A	Input B	Result:	⁴ Flags Infinite/Zero/Inexact/Invalid
Normalized/Subnormalized	Normalized/Subnormalized	Normalized/Subnormalized	0/0/x/0
Normalized/Subnormalized	Normalized/Subnormalized	0 value Sign bit = 0	0/0/0/0
0 value	Normalized/Subnormalized	Input b	0/0/0/0
Positive/negative infinity	Normalized/Subnormalized	Positive/negative infinity value	1/0/0/0
Quiet Not A Number (qNaN)	Normalized/Subnormalized	qNaN value	0/0/0/1 Mantissa = {100...00}
0 value	0 value	0 value For (-0 + (-0)) equation, sign bit = 1. For any other equation, sign bit = 0.	0/0/0/0
Positive/negative infinity	0 value	Positive/negative infinity value	1/0/0/0
Quiet Not A Number (qNaN)	0 value	qNaN value	0/0/0/1 Mantissa = {100...00}
Positive/negative infinity	Positive/negative infinity	qNaN value for invalid cases Positive/negative infinity value for valid cases	0/0/0/1 for invalid cases Mantissa = {100...00} 1/0/0/0 for valid cases Valid cases are: Positive infinity value + positive infinity value Negative infinity value + negative infinity value Negative infinity value - positive infinity value Positive infinity value - negative infinity value
Quiet Not A Number (qNaN)	Positive/negative infinity	qNaN value	0/0/0/1 Mantissa = {100...00}
Quiet Not A Number (qNaN)	Quiet Not A Number (qNaN)	qNaN value	0/0/0/1 Mantissa = {100...00}

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel Agilex® 7 Variable Precision DSP Blocks User Guide

2.2.6. Exception Handling for Floating-point Arithmetic