Intel® Gaudi® software version 1.9.0 release upgrades several libraries:
- PyTorch* Lightning 1.9.4
- DeepSpeed* 0.7.7
- fairseq 0.12.3
- Horovod* v0.27.0
We added support for Red Hat* Enterprise Linux* 8 on Intel Gaudi 2 AI accelerators. Support for TensorFlow 2.8.4 is deprecated and is unavailable starting this release.
Intel Gaudi software now provides a GPU migration toolkit, which simplifies migrating PyTorch* models that contain Python* API calls with dependencies on GPU libraries (for example torch.cuda calls). For more details, see the GPU Migration Toolkit.
We enabled native PyTorch autocast support for Intel Gaudi software and demonstrated the same on several reference models. For users interested in tracking memory use, we have added monitoring HPU memory use metrics during training on TensorBoard*.
The Intel Gaudi software fork of DeepSpeed now includes support for ZeRO-3 as well as ZeRO-Offload with ZeRO-1 or ZeRO-2 for training. In addition, DeepSpeed activation checkpointing is validated with activation partitioning and contiguous memory optimization flags. For more details, see the DeepSpeed User Guide.
Several reference models were updated with instructions to train and infer on Intel Gaudi 2 software and first-generation Intel Gaudi software.
- For training, this includes Stable Diffusion* scaling up to 64 cards and HuBERT.
- For inference, BLOOM-176B with beam search, Stable Diffusion v2.1, UNet2D, UNet3D, and ResNeXt101.
We enhanced the performance of many models with this release. For more details, see the model performance page.
For more information on this release, see Release Notes.