Intrinsics for Converting Half Floats

Intel® C++ Compiler Classic Developer Guide and Reference

Download PDF

ID 767249

Date 3/31/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-A90DF672-7FB2-444A-BB2D-3E2C03E64268

View Details

Intrinsics for Converting Half Floats

The half-float or 16-bit float is a popular type in some application domains. The half-float type is regarded as a storage type because although data is often stored as a half-float, computation is never done on values in these type. Usually values are converted to regular 32-bit floats before any computation.

Support for half-float type is restricted to just conversions to/from 32-bit floats. The main benefits of using half float type are:

reduced storage requirements
less consumption of memory bandwidth and cache
accuracy and precision adequate for many applications

Half Float Intrinsics

The half-float intrinsics are provided to convert half-float values to 32-bit floats for computation purposes and, conversely, 32-bit float values to half-float values for data storage purposes.

The intrinsics are translated into library calls that do the actual conversions.

The half-float intrinsics are available on IA-32 and Intel® 64 architectures running supported operating systems. The minimum processor requirement is an Intel® Pentium 4 processor and an operating system supporting Intel® Streaming SIMD Extensions 2 (Intel® SSE2) instructions.

Role of Immediate Byte in Half Float Intrinsic Operations

For all half-float intrinsics an immediate byte controls rounding mode, flush to zero, and other non-volatile set values. The format of the imm8 byte is as shown in the diagram below.

The imm8 value is used for special MXCSR overrides.

In the diagram,

MBZ = Most significant Bit is Zero; used for error checking
MS1 = 1 : use MXCSR RC, else use imm8.RC
SAE = 1 : all exceptions are suppressed
MS2 = 1 : use MXCSR FTZ/DAZ control, else use imm8.FTZ/DAZ.

The compiler passes the bits to the library function, with error checking - the most significant bit must be zero.

Parent topic: Intrinsics

Details About Intrinsics for Half Floats

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® C++ Compiler Classic Developer Guide and Reference

Intrinsics for Converting Half Floats

Half Float Intrinsics

Role of Immediate Byte in Half Float Intrinsic Operations