Reference Implementations for Intel® Architecture Approximation...

Marius Cornea

We are providing two files, RECIP14.c and RECIP28EXP2.c, containing reference implementations for the scalar versions of 10 approximation instructions introduced in the Intel® Architecture Instruction Set Extensions Programming Reference document. The files can be downloaded from the links provided above.

RECIP14.c contains emulation routines for the underlying algorithms of:

VRCP14PD - Compute Approximate Reciprocals of Packed Float64 Values with relative error of less than 2^-14
VRCP14SD - Compute Approximate Reciprocal of Scalar Float64 Value with relative error of less than 2^-14
VRCP14PS - Compute Approximate Reciprocals of Packed Float32 Values with relative error of less than 2^-14
VRCP14SS - Compute Approximate Reciprocal of Scalar Float32 Value with relative error of less than 2^-14
VRSQRT14PD - Compute Approximate Reciprocals of Square Roots of Packed Float64 Values with relative error of less than 2^-14
VRSQRT14SD - Compute Approximate Reciprocal of Square Root of Scalar Float64 Value with relative error of less than 2^-14
VRSQRT14PS - Compute Approximate Reciprocals of Square Roots of PackedFloat32 Values with relative error of less than 2^-14
VRSQRT14SS - Compute Approximate Reciprocal of Square Root of Scalar Float32 Value with relative error of less than 2^-14

The corresponding emulation routines (only scalar versions) are:

RCP14S - reciprocal approximation for Float32
RCP14D - reciprocal approximation for Float64
RSQRT14S - reciprocal square root approximation for Float32
RSQRT14D - reciprocal square root approximation for Float64

RECIP28EXP2.c contains emulation routines for the underlying algorithms of:

VRCP28PD - Approximation to the Reciprocal of Packed Double Precision Floating-Point Values with Less Than 2^-28 Relative Error
VRCP28SD - Approximation to the Reciprocal of Scalar Double Precision Floating-Point Value with Less Than 2^-28 Relative Error
VRCP28PS - Approximation to the Reciprocal of Packed Single Precision Floating-Point Values with Less Than 2^-28 Relative Error
VRCP28SS - Approximation to the Reciprocal of Scalar Single Precision Floating-Point Value with Less Than 2^-28 Relative Error
VRSQRT28PD - Approximation to the Reciprocal Square Root of Packed Double Precision Floating-Point Values with Less Than 2^-28 Relative Error
VRSQRT28SD - Approximation to the Reciprocal Square Root of Scalar Double Precision Floating-Point Value with Less Than 2^-28 Relative Error
VRSQRT28PS - Approximation to the Reciprocal Square Root of Packed Single Precision Floating-Point Values with Less Than 2^-28 Relative Error
VRSQRT28SS - Approximation to the Reciprocal Square Root of Scalar Single Precision Floating-Point Value with Less Than 2^-28 Relative Error
VEXP2PD - Approximation to the Exponential 2^x of Packed Double Precision Floating-Point Values with Less Than 2^-23Relative Error
VEXP2PS - Approximation to the Exponential 2^x of Packed Single Precision Floating-Point Values with Less Than 2^-23Relative Error

The corresponding emulation routines (only scalar versions) are:

RCP28S - reciprocal approximation for Float32
RCP28D - reciprocal approximation for Float64
RSQRT28S - reciprocal square root approximation for Float32
RSQRT28D - reciprocal square root approximation for Float64
EXP2S - Base-2 exponential approximation for Float32
EXP2D - Base-2 exponential approximation for Float64

The reference functions have to be compiled with the DAZ and FTZ mode turned off (e.g. with the Intel compiler for Linux, using the -no-ftz option), and have to be run with the rounding mode set to round-to-nearest, and with floating-point exceptions masked.

Usage example for RCP14S and RCP14D