Developer Guide
FPGA Optimization Guide for Intel® oneAPI Toolkits
A newer version of this document is available. Customers should click here to go to the newest version.
Conversion Rules for ap_float
You can convert between different sizes of ap_float data types through assignment or by using the convert_to() function. For example,
using namespace ihc;
ap_float<8, 32> myFloat = ...;
ap_float<3, 18> myFloat2 = myFloat; // use rounding rules defined by ap_float type
// use rounding rules defined in convert_to() function call
ap_float <3, 18> myFloat3 = myFloat.convert_to<3, 18, ihc::fp_config::FP_Round::RZERO>();
To convert between native types (for example, float, double) and ap_float data types, assign to or from the types. Type conversion in an assignment occurs according to the rules mentioned in Table 1.
For two ap_float variables in a binary operation, the ap_float variable with the larger exponent bit-width is considered to be the larger variable. If two variables have the same exponent bit width, the variable with the larger mantissa bit-width is considered to be the larger variable. The operands are then unified to the larger type before the binary operation occurs.
Native floating-point data types and ap_float data types are converted to ap_float data types according to the rules in Table 1.
The Intel® oneAPI DPC++/C++ Compiler also provides some operations that leave the precision of input types untouched and provide control over the output precision. For more details, refer to Operations with Explicit Precision Controls.
| Data Type | From ap_float To Data Type | From Data Type To ap_float |
|---|---|---|
| ap_float with higher representable range | Keep exponent equivalent. The mantissa is rounded according to the rounding mode of the target ap_float (with the higher representable range). |
+-Inf if the source of the conversion is out of the representable range. Otherwise, keep exponent equivalent. The mantissa is rounded according to the rounding mode of the target ap_float (with the smaller representable range). |
| float | Convert original ap_float to ap_float<8, 23> with the previous ap_float rule, and then bit cast to float. | Bit-cast float to ap_float<8, 23>, and then convert to target ap_float precision using the ap_float to ap_float rules described previously. |
| double | Convert original ap_float to ap_float<11, 52> with earlier ap_float rule, and then bit cast to double. | Bit-cast double to ap_float<11, 52>, and then convert to the target ap_float precision using the ap_float to ap_float rules described earlier. |
| long double (emulation only) (Linux only) |
Convert the original ap_float to ap_float<15, 63> with the earlier ap_float rule, and then insert a 1-bit 1 to the MSB of fraction bits to get an approximate equivalent of 80-bit representation of a long double. | Drop the explicit one fraction bit to convert long double to 79-bit ap_float<15, 63>. |
| C++ native integer types | Truncate towards zero. Converting from ap_float that is larger than the range of integer type is an undefined behavior. |
Round to the nearest, tie breaks to even. If the integer value is too large, the ap_float value saturates to plus infinity. |
You must avoid assigning the result of the convert_to function to another ap_float variable. This is because if the left-hand side of the assignment has a different exponent or mantissa widths than the ones specified in the convert_to function on the right-hand side, another conversion can occur.