DSP Builder for Intel® FPGAs (Advanced Blockset): Handbook

ID 683337
Date 4/01/2024
Public
Document Table of Contents

10.3. DSP Builder Round-Off Errors

Every mathematical operation on floating-point data incurs a round-off error.

For the fundamental operations (add, subtract, multiple, divide) this error is determined by the rounding mode:

  • Correct. A typical relative error is half the magnitude of the LSB in the mantissa.
  • Faithful. A typical relative error is equal to the magnitude of the LSB in the mantissa.

The relative error for float16_m10 is approximately 0.1% for faithful rounding, and 0.05% for correct rounding. The rounding mode is a configurable mask parameter.

The elementary mathematical functions conform to the error tolerances specified in the OpenCL standard. In practice, the relative error exhibited by the DSP Builder mathematical library lies comfortably within the specified tolerances.

Bit cancellations can occur when subtracting two floating-point numbers that are very close in value, which can introduce very large relativeerrors. You need to take the same precautions with floating-point designs as with numerical software to prevent bit cancellations.