Intel Agilex® 7 Variable Precision DSP Blocks User Guide

ID 683037
Date 10/02/2023
Public
Document Table of Contents

2.2.6. Exception Handling for Floating-point Arithmetic

The Intel Agilex® 7 floating-point arithmetic supports exception handling for the multiplier and adder blocks.

Table 9.  Supported Exception Flags
Floating-point Format Exception Flags Width Description
Single precision Multiplication
fp32_mult_overflow 1

This signal indicates if the multiplier result is a larger value than the maximum presentable value.

1: If the multiplier result is a larger value than the maximum representable value and the result is cast to infinity.

0: If the multiplier result is not larger than the maximum presentable value.

This signal is not available in Adder or Subtract Mode.

fp32_mult_underflow 1

This signal indicates if the multiplier result is a smaller value than the minimum presentable value.

1: If the multiplier result is a smaller value than the minimum representable non-zero absolute value and the result is flushed to zero.

0: If the multiplier result is a larger than the minimum representable value.

This signal is not available in Adder or Subtract Mode.

fp32_mult_inexact 1

This signal indicates if the multiplier result is not accurately represented.

1: If the multiplier result is:
  • a rounded value
  • a smaller value than the minimum representable value or
  • a larger value than the maximum representable value.

0: If the multiplier result does not meet any of the criteria above.

This signal is not available in Adder or Subtract Mode.

fp32_mult_invalid 1

This signal indicates if the multiplier operation is ill-defined and produces an invalid result.

1: If the multiplier result is invalid and cast to qNaN.

0: If the multiplier result is not an invalid number.

This signal is not available in Adder or Subtract Mode.

Addition
fp32_adder_overflow 1

This signal indicates if the adder result is a larger value than the maximum representable value.

1: If the adder result is a larger value than the maximum presentable value and the result is cast to infinity.

0: If the adder result is not larger than the maximum presentable value.

This signal is not available in Multiplication Mode.

fp32_adder_underflow 1

This signal indicates if the adder result is a smaller value than the minimum presentable value.

1: If the adder result is a smaller value than the minimum representable non-zero absolute value and the result is flushed to zero.

0: If the adder result is a larger than the minimum representable value.

This signal is not available in Multiplication Mode.

fp32_adder_inexact 1

This signal indicates if the adder result is not accurately represented.

1: If the adder result is:
  • a rounded value
  • a smaller value than the minimum representable value or
  • a larger value than the maximum representable value.

0: If the adder result does not meet any of the criteria above.

This signal is not available in Multiplication Mode.

fp32_adder_invalid 1

This signal indicates if the adder operation is ill-defined and produces an invalid result.

1: If the adder result is invalid and cast to qNaN.

0: If the adder result is not an invalid number.

This signal is not available in Multiplication Mode.

Half precision Multiplication

fp16_mult_top_overflow

fp16_mult_bot_overflow

1

This signal indicates if the top or bottom multiplier result is a larger value than the maximum presentable value.

1: If the multiplier result is a larger value than the maximum representable value and the result is cast to infinity.

0: If the multiplier result is smaller than the maximum presentable value.

This signal is not available in Adder or Subtract Mode and Extended format.

fp16_mult_top_underflow

fp16_mult_bot_underflow

1

This signal indicates if the top or bottom multiplier result is a smaller value than the minimum presentable value.

1: If the multiplier result is a smaller value than the minimum representable value and the result is flushed to zero.

0: If the multiplier result is a larger than the minimum representable value.

This signal is not available in Adder or Subtract Mode and Extended format.

fp16_mult_top_inexact

fp16_mult_bot_inexact

1

This signal indicates if the top or bottom multiplier result is an exact representation.

1: If the multiplier result is:
  • a rounded value
  • a smaller value than the minimum representable value or
  • a larger value than the maximum representable value.

0: If the multiplier result does not meet any of the criteria above.

This signal is not available in Adder or Subtract Mode.

fp16_mult_top_invalid

fp16_mult_bot_invalid

1

This signal indicates if the multiplier operation is ill-defined and produces an invalid result.

1: If the multiplier result is invalid and cast to qNaN.

0: If the multiplier result is not an invalid number.

This signal is not available in Adder or Subtract Mode.

fp16_mult_top_infinite

fp16_mult_bot_infinite

1

This signal indicates if the top or bottom multiplier result is a positive or negative infinity.

1: If the result is infinite

0: If the result is normalized float or in the appropriate infinity range

This signal is only available for Extended format.

fp16_mult_top_zero

fp16_mult_bot_zero

1

This signal indicates if the top or bottom multiplier result is a positive or negative zero.

1: If the result is zero

0: If the result is not a zero

This signal is only available for Extended format.

Addition
fp16_adder_overflow 1

This signal indicates if the adder result is a larger value than the maximum representable value.

1: If the adder result is a larger value than the maximum presentable value and the result is cast to infinity.

0: If the adder result is not larger than the maximum presentable value.

This signal is not available in Multiplication Mode Extended format.

fp16_adder_underflow 1

This signal indicates if the adder result is a smaller value than the minimum presentable value.

1: If the adder result is a smaller value than the minimum representable value and the result is flushed to zero.

0: If the adder result is a larger than the minimum representable value.

This signal is not available in Multiplication Mode Extended format.

fp16_adder_inexact 1

This signal indicates if the adder result is an exact representation.

1: If the adder result is:
  • a rounded value
  • a smaller value than the minimum representable value or
  • a larger value than the maximum representable value.

0: If the adder result does not meet any of the criteria above.

This signal is not available in Multiplication Mode.

fp16_adder_invalid 1

This signal indicates if the adder operation is ill-defined and produces an invalid result.

1: If the adder result is invalid and cast to qNaN.

0: If the adder result is not an invalid number.

This signal is not available in Multiplication Mode.

fp16_adder_infinite 1

This signal indicates if the adder result is a positive or negative infinity.

1: If the result is infinite

0: If the result is normalized float or in the appropriate infinity range

This signal is only available for Extended format.

fp16_adder_zero 1

This signal indicates if the adder result is a positive or negative zero.

1: If the result is zero

0: If the result is not a zero

This signal is only available for Extended format.

Table 10.  Multiplier Exception Handling Possible Results for FP32 Multiplication, FP16 Flushed, and FP16 Bfloat16 Modes
Input A Input B Result 4

Flags

Overflow/Underflow/Inexact/Invalid

Normalized Normalized Normalized value 0/0/0/0
Normalized (rounded) value 0/0/1/0
Positive/negative infinity value 1/0/1/0
Subnormal (denormal) value 0/1/1/0
0 or Subnormal (denormal) Normalized 0 value 0/0/0/0
Positive/negative infinity Normalized Positive/negative infinity value 0/0/0/0
Quiet Not A Number (qNaN) Normalized qNaN value 0/0/0/0
0 or Subnormal (denormal) 0 or Subnormal (denormal) 0 value 0/0/0/0
Positive/negative infinity 0 or Subnormal (denormal) qNaN value 0/0/0/1
Quiet Not A Number (qNaN) 0 or Subnormal (denormal) qNaN value 0/0/0/0
Positive/negative infinity Positive/negative Infinity Positive/negative infinity value 0/0/0/0
Quiet Not A Number (qNaN) Positive/negative Infinity qNaN value 0/0/0/0
Quiet Not A Number (qNaN) Quiet Not A Number (qNaN) qNaN value 0/0/0/0
Table 11.  Adder Exception Handling Possible Results for FP32 Addition/Subtraction, FP16 Flushed, and FP16 Bfloat16 Modes
Input A Input B Result : 4

Flags

Overflow/Underflow/Inexact/Invalid

Normalized Normalized Normalized value 0/0/0/0
Normalized (rounded) value 0/0/1/0
Positive/negative infinity value 1/0/1/0
0 value

Sign bit = 0

0/0/0/0
Subnormal (denormal) value

The sign is preserved

0/1/1/0
0 or Subnormal (denormal) Normalized Input b 0/0/0/0
Positive/negative infinity Normalized Positive/negative infinity value 0/0/0/0
Quiet Not A Number (qNaN) Normalized qNaN value 0/0/0/0
0 or Subnormal (denormal) 0 or Subnormal (denormal) 0 value

For (-0 + (-0)) equation, sign bit = 1. For any other equation, sign bit = 0.

0/0/0/0
Positive/negative infinity 0 or Subnormal (denormal) Positive/negative infinity value 0/0/0/0
Quiet Not A Number (qNaN) 0 or Subnormal (denormal) qNaN value 0/0/0/0
Positive/negative infinity Positive/negative infinity

qNaN value for invalid cases

Positive/negative infinity value for valid cases

0/0/0/1 for invalid cases

0/0/0/0 for valid cases

Valid cases are:
  • Positive infinity value + positive infinity value
  • Negative infinity value + negative infinity value
  • Negative infinity value - positive infinity value
  • Positive infinity value - negative infinity value
Quiet Not A Number (qNaN) Positive/negative infinity qNaN value 0/0/0/0
Quiet Not A Number (qNaN) Quiet Not A Number (qNaN) qNaN value 0/0/0/0
Table 12.  Multiplication Exception Handling Possible Results for FP16 Extended Modes
Input A Input B Result: 4

Flags

Infinite/Zero/Inexact/Invalid

Normalized/Subnormalized Normalized/Subnormalized Normalized/Subnormalized 0/0/x/0
0 value Normalized/Subnormalized 0 value 0/1/0/0
Positive/negative infinity Normalized/Subnormalized Positive/negative infinity value 1/0/0/0
Quiet Not A Number (qNaN) Normalized/Subnormalized qNaN value 0/0/0/1

Mantissa = {100...00}

0 value 0 value 0 value 0/1/0/0
Positive/negative infinity 0 value qNaN value 0/0/0/1

Mantissa = {100...00}

Quiet Not A Number (qNaN) 0 value qNaN value 0/0/0/1

Mantissa = {100...00}

Positive/negative infinity Positive/negative infinity Positive/negative infinity value 1/0/0/0
Quiet Not A Number (qNaN) Positive/negative infinity qNaN value 0/0/0/1

Mantissa = {100...00}

Quiet Not A Number (qNaN) Quiet Not A Number (qNaN) qNaN value 0/0/0/1

Mantissa = {100...00}

Table 13.  Addition Exception Handling Possible Results for FP16 Extended Modes
Input A Input B Result: 4

Flags

Infinite/Zero/Inexact/Invalid

Normalized/Subnormalized Normalized/Subnormalized Normalized/Subnormalized 0/0/x/0
0 value

Sign bit = 0

0/0/0/0
0 value Normalized/Subnormalized Input b 0/0/0/0
Positive/negative infinity Normalized/Subnormalized Positive/negative infinity value 1/0/0/0
Quiet Not A Number (qNaN) Normalized/Subnormalized qNaN value 0/0/0/1

Mantissa = {100...00}

0 value 0 value 0 value

For (-0 + (-0)) equation, sign bit = 1. For any other equation, sign bit = 0.

0/0/0/0
Positive/negative infinity 0 value Positive/negative infinity value 1/0/0/0
Quiet Not A Number (qNaN) 0 value qNaN value 0/0/0/1

Mantissa = {100...00}

Positive/negative infinity Positive/negative infinity

qNaN value for invalid cases

Positive/negative infinity value for valid cases

0/0/0/1 for invalid cases

Mantissa = {100...00}

1/0/0/0 for valid cases

Valid cases are:
  • Positive infinity value + positive infinity value
  • Negative infinity value + negative infinity value
  • Negative infinity value - positive infinity value
  • Positive infinity value - negative infinity value
Quiet Not A Number (qNaN) Positive/negative infinity qNaN value 0/0/0/1

Mantissa = {100...00}

Quiet Not A Number (qNaN) Quiet Not A Number (qNaN) qNaN value 0/0/0/1

Mantissa = {100...00}

4 Output exception flags. These flags do not change if exceptions are at input value.