Visible to Intel only — GUID: GUID-74E9531E-87FC-4A78-A403-BF979EF122C2
Low-precision Datatypes
oneCCL provides support for collective operations on low-precision (LP) datatypes (bfloat16 and float16).
Reduction of LP buffers (for example as phase in ccl::allreduce) includes conversion from LP to FP32 format, reduction of FP32 values and conversion from FP32 to LP format.
oneCCL utilizes CPU vector instructions for FP32 <-> LP conversion.
For BF16 <-> FP32 conversion oneCCL provides AVX512F and AVX512_BF16-based implementations. AVX512F-based implementation requires GCC 4.9 or higher. AVX512_BF16-based implementation requires GCC 10.0 or higher and GNU binutils 2.33 or higher. AVX512_BF16-based implementation may provide less accuracy loss after multiple up-down conversions.
For FP16 <-> FP32 conversion oneCCL provides F16C and AVX512F-based implementations. Both implementations require GCC 4.9, Clang 9.0 or higher.
utilizes CPU vector instructions for LP numeric operations.
For FP16 numeric operations (arithmetic, load, store) provides AVX512FP16-based implementation. This implementation requires GCC 12.0, Clang 14.0, Intel 2021.4.0 or higher.
Refer to Low-precision datatypes for details about relevant environment variables.