11.3. DSP Builder Round-Off Errors
For the fundamental operations (add, subtract, multiple, divide) this error is determined by the rounding mode:
- Correct. A typical relative error is half the magnitude of the LSB in the mantissa.
- Faithful. A typical relative error is equal to the magnitude of the LSB in the mantissa.
The relative error for float16_m10 is approximately 0.1% for faithful rounding, and 0.05% for correct rounding. The rounding mode is a configurable mask parameter.
The elementary mathematical functions conform to the error tolerances specified in the OpenCL standard. In practice, the relative error exhibited by the DSP Builder mathematical library lies comfortably within the specified tolerances.
Bit cancellations can occur when subtracting two floating-point numbers that are very close in value, which can introduce very large relativeerrors. You need to take the same precautions with floating-point designs as with numerical software to prevent bit cancellations.