Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference

ID 767253
Date 11/07/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

IEEE Floating-Point Operations

Understand the IEEE Standard for Floating-Point Arithmetic, IEEE 754-2008

This version of the compiler uses a close approximation to the IEEE Standard for Floating-point Arithmetic, version IEEE 754-2008, unless otherwise stated. This standard is common to many microcomputer-based systems due to the availability of fast processors that implement the required characteristics.

This section outlines the characteristics of the IEEE 754-2008 standard and its implementation in the compiler. Except as noted, the description refers to both the IEEE 754-2008 standard and the compiler implementation.

Special Values

The following list provides a brief description of the special values that the Intel® oneAPI DPC++/C++ Compiler supports.

  • Signed Zero: The sign of zero is the same as the sign of a nonzero number. Comparisons consider +0 to be equal to -0. A signed zero is useful in certain numerical analysis algorithms, but in most applications the sign of zero is invisible.
  • Denormalized Numbers: Denormalized numbers (denormals) fill the gap between the smallest positive and the smallest negative normalized number, otherwise only (+/-) 0 occurs in the interval. Denormalized numbers extend the range of computable results by allowing for gradual underflow.

    The Underflow status flag is set when a number loses precision and becomes a denormal.

  • Signed Infinity: Infinities are the result of arithmetic in the limiting case of operands with arbitrarily large magnitude. They provide a way to continue when an overflow occurs. The sign of an infinity is simply the sign you obtain for a finite number in the same operation as the finite number approaches an infinite value.

    By retrieving the status flags, you can differentiate between an infinity that results from an overflow and one that results from division by zero. The compiler treats infinity as signed by default. The output value of infinity is +Infinity or -Infinity.

  • Not a Number: Not a Number (NaN) may result from an invalid operation. For example, 0/0 and SQRT(-1) result in NaN. In general, an operation involving a NaN produces another NaN. Because the fraction of a NaN is unspecified, there are many possible NaNs

    The compiler treats all NaNs identically, but there are two classes of NaNs:

    • Signaling NaNs: Have an initial mantissa bit of 0. They usually raise an invalid exception when used in an operation.

    • Quiet NaNs: Have an initial mantissa bit of 1.

    The floating-point hardware usually converts a signaling NaN into a quiet NaN during computational operations. An invalid exception is raised and the resulting Floating-point value is a quiet NaN.

When fp-model fast is used (default), the compiler assumes no signed zeros, no infinite values, no NaN values, and denormal values are flushed to zero.