|
IA-64 Floating-Point Operations and the IEEE Standard for Binary Floating-Point Arithmetic (continued) IA-64 FLOATING-POINT OPERATIONS All the floating-point operations mandated or recommended by the IEEE Standard are or can be implemented in IA-64 [2]. Note that most IA-64 instructions [2] are predicated by a 1-bit predicate (qp) from the 64-bit predicate register (predicate p0 is fixed, containing always the logical value 1). For example, the fused multiply-add operation is
The fma instruction is executed if qp = 1; otherwise, it is skipped. Two instruction completers select the precision control (pc) and the status field (sf) to be used. When the qualifying predicate is not represented, it is either not necessary, or it is assumed to be p0. When qp = 1, fma calculates f3 Addition and multiplication are implemented as pseudo-ops of the floating-point multiply-add operation. The pseudo-op for addition is fadd.pc.sf f1 = f3, f2 obtained by replacing f4 with register f1 that contains +1.0. The pseudo-op for multiplication is fmpy.pc.sf f1 = f3, f4, obtained by replacing f2 with f0 that contains +0.0.
The reason for having a fused multiply-add operation is that it allows computation of a
The benefit that arises from this property is that it enables the implementation of a whole new category of numerical algorithms, relying on the possibility of performing this combined operation with only one rounding error (see the subsections on divide and square root below).
Subtraction (fsub.pc.sf f1 = f3, f2) is implemented as a pseudo-op of the floating-point multiply-subtract, fms.pc.sf f1 = f3, f4, f2 (which calculates f3 A deviation from one of the IEEE Standard's recommendations is to allow higher precision operands to lead to lower precision results. However, this is a useful feature when implementing the divide, remainder, and square root operations in software.
For parallel computations, counterparts of fma, fms, and fnma are provided. For example, fpma.pc.sf f1 = f3, f4, f2 calculates f3 Divide, square root, and remainder operations are not available directly in hardware. Instead, they have to be implemented in software as sequences of instructions corresponding to iterative algorithms (described below). Rounding of a floating-point number to a 64-bit signed integer in floating-point format is achieved by the fcvt.fx.sf f1 = f2 instruction followed by fcvt.xf f2 = f1. For 64-bit unsigned integers, the similar instructions are fcvt.fxu.sf f1 = f2 and fcvt.xuf.pc.sf f2 = f1. Two variations of the instructions that convert floating-point numbers to integer use the rounding-to-zero mode regardless of the rounding control bits used in the FPSR status field (fcvt.fx.trunc.sf f1 = f2 and fcvt.fxu.trunc.sf f1 = f2). They are useful in implementing integer divide and remainder operations using floating-point instructions. For example, the following instructions convert a single precision floating-point number from memory (whose address is in the general register r30) to a 64-bit signed integer in r8:
ldfs f6=[r30];; // load single precision fp number(Note that stop bits (;;) delimit the instruction groups.) The biased exponent of the value in f7 is set by fcvt.fx.trunc.s0 to 0x1003e (unbiased 63) and the significand to the signed integer that is the result of the conversion. (If the conversion is invalid, the significand is set to the value of Integer Indefinite, which is -263.) Since rounding to zero is used by fcvt.fx.trunc, specifying the status field only tells which status flags to set if an invalid operation, denormal, or inexact result exception occurs (Exceptions and Traps are covered later in the paper.) For the conversion from a floating-point number to a 64-bit unsigned integer, fcvt.fx.trunc above has to be replaced by fcvt.fxu.trunc. The opposite conversion, from a 64-bit signed integer in r32 to a register-file format floating-point number in f7, is performed by
setf.sig f6 = r32;; //sign=0 exp=0x1003e signif.=r32where the result is an integer-valued normal floating-point number. To convert further, for example to a single precision floating-point number, one more instruction is needed
fma.s.s0 f8=f7,f1,f0;;where the single precision format is specified statically, and status field s0 is assumed to have wre = 0. For 64-bit unsigned integers, the similar conversion is
setf.sig f6 = r32;; // sign=0 exp= 0x1003e signif.=r32where fcvt.xuf.pc.sf f7 = f6 is actually a pseudo-op for fma.pc.sf f7 = f6, f1, f0, and a synonym of fnorm.pc.sf f7 = f6 (it is assumed that status field s0 has pc = 0x3). The result is thus a normalized integer-valued floating-point number. This is important to know, since floating-point operations on unnormalized numbers lead to Software Assistance faults (as explained further in the paper), thereby slowing down performance unnecessarily. Conversions between the different floating-point formats are achieved using floating-point load, store, or other operations. For example, the following sequence converts a single precision value from memory to double precision format, also in memory (r29 contains the address of the single precision source, and r30 that of the double precision destination):
ldfs f6 = [r29];;This conversion could trigger the invalid exception (for a signaling NaN operand) or the denormal operand exception. These can happen on the fma instruction, but the conversion will be correct numerically even without this instruction, as all the single precision values can be represented in the double precision format. The opposite conversion is shown below (it is assumed that status field s0 has wre = 0):
ldfd f6=[r29];;The role of the fma.s.s0 is to trigger possible invalid, denormal, underflow, overflow, or inexact exceptions on this conversion. Other conversions between floating-point and integer formats can be achieved with short sequences of instructions. For example, the following sequence converts a single precision floating-point value in memory to a 32-bit signed integer (correct only if the result fits on 32 bits):
ldfs f6 = [r30];; // load f6 with fp value from memoryThe opposite conversion, from a 32-bit integer in memory to a single precision floating-point number in memory, is performed by
ld4 r29 = [r30];; // load r29 with 32-bit int from memFloating-point compare operations can be performed directly between numbers in floating-point register file format, using the fcmp instruction. For other memory formats, a conversion to register format is required prior to applying the floating-point compare instruction. From the 26 functionally distinct relations specified by the IEEE Standard, only the six mandatory ones are implemented (four directly, and two as pseudo-ops):
fcmp.eq.sf p1, p2 = f2, f3 (test for '=')The result of a compare operation is written to two 1-bit predicates in the 64-bit predicate register. Predicate p1 shows the result of the comparison, while p2 is its opposite. An exception is the case when at least one input value is NaTVal, when p1 = p2 = 0. A variant of the fcmp instruction is called 'unconditional' (with respect to the qualifying predicate). The difference is that if qp = 0, the unconditional compare
clears both output predicates, while
leaves them unchanged. Six more compare relations are implemented, as pseudo-ops of the above, to test for the opposite situations (neq, nlt, nle, ngt, nge, and ord). The remaining 14 comparison relations specified by the IEEE Standard can be performed based on the above. A special type of compare instruction is fclass.fcrel.fctype p1,p2=f2,fclass9, that allows classification of the contents of f2 according to the class specifier fclass9. The fcrel instruction completer can be 'm' (if f2 has to agree with the pattern specified by fclass9), or 'nm' (f2 has to disagree). The fctype completer can be none or 'unc' (as for fcmp). fclass9 can specify one of {NaTVal, QNaN, SNaN} OR none, one or both of {positive, negative} AND none, one or several of {zero, unnormal, normal, infinity} (nine bits correspond to the nine classes that can be selected, with the restrictions specified on the possible combinations). |