Visible to Intel only — GUID: GUID-32A7AB37-0A5B-41C2-96E4-869625C48CD3
Low-precision Datatypes
oneCCL provides support for collective operations on low-precision (LP) datatypes (bfloat16 and float16).
Reduction of LP buffers (for example as phase in ccl::allreduce) includes conversion from LP to FP32 format, reduction of FP32 values and conversion from FP32 to LP format.
oneCCL utilizes CPU vector instructions for FP32 <-> LP conversion.
For BF16 <-> FP32 conversion oneCCL provides AVX512F and AVX512_BF16-based implementations. AVX512F-based implementation requires GCC 4.9 or higher. AVX512_BF16-based implementation requires GCC 10.0 or higher and GNU binutils 2.33 or higher. AVX512_BF16-based implementation may provide less accuracy loss after multiple up-down conversions.
For FP16 <-> FP32 conversion oneCCL provides F16C and AVX512F-based implementations. Both implementations require GCC 4.9 or higher.
Refer to Low-precision datatypes for details about relevant environment variables.