We added support for PyTorch Fully Sharded Data Parallel (FSDP). FSDP runs distributed training on large-scale models while reducing memory footprint. See Using Fully Sharded Data Parallel (FSDP) with Intel Gaudi.
Added further improvements of Text Generation Inference (TGI) support for Gaudi. For more details, see https://github.com/huggingface/tgi-gaudi.
We have improved the performance and support for the LLaMA models family. This includes LLaMA 2 70B BF16 for pre-training and LLaMA 2 7B/70B BF16/FP8 for inference. In addition, we optimized Mixtral 8x7B for inference.
Gaudi Megatron-DeepSpeed fork was moved to a separate repository and rebased to PR #307. You can find the new repository here https://github.com/HabanaAI/Megatron-DeepSpeed.
You can now use CRI-O with Intel Gaudi processors, in addition to the existing support for Docker Engine and ContainerD. You can find more information here https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#set-up-container-runtime
In this release, we’ve also upgraded the validated versions of several libraries, including PyTorch 2.2.0, DeepSpeed 0.12.4, PyTorch Lightning 2.2.0, Kubernetes 1.27, 1.28 and 1.29, OpenShift 4.14, RHEL 9.2, and Megatron-DeepSpeed PR #307.
Lastly a reminder that the support for TensorFlow is deprecated starting from Intel Gaudi Software 1.15.0. You can find more information on Intel Gaudi software 1.15.0 release notes page.