We’re excited to introduce the release of Intel® Gaudi® software version 1.17.0, which brings numerous enhancements and updates for an improved GenAI development experience on the Intel(R) Gaudi(R) platform.
We added support for vLLM, a fast and easy-to-use library for LLM inference and serving. For more information, see the Intel Gaudi vLLM.
We have added software backward/forward compatibility support for Gaudi 2, allowing users to run 1.16 dockers with Intel Gaudi software version 1.17.0.
In addition, the update provides
- Preview support for inference on UINT4 data type. Running inference in UINT4 halves the required memory bandwidth compared to running inference in FP8. Extended support and performance optimizations will be added in subsequent releases. See Run Inference Using UINT4.
- Support for PyTorch Distributed Tensor (DTensor) and Tensor Parallel. DTensor allows sharding tensors across multiple devices and performs operations on those tensors in a distributed manner. See Using Distributed Tensor with Intel Gaudi.Intel Gaudi Base Operator for Kubernetes, allowing users to automate the management of all Intel Gaudi software components in Kubernetes. See Intel Gaudi Base Operator for Kubernetes
- Improved performance of various LLM models for Intel Gaudi2 and Gaudi3 accelerators, including LLaMA 2 7B/70B and LLaMA 3 8B for inference. For more information, check out the Intel Gaudi model performance page. Upgraded tovalidated versions of several libraries, including PyTorch 2.3.1, DeepSpeed 0.14.0, PyTorch Lightning 2.3.3, vLLM 0.5.1, Text Generation Inference (TGI) 2.0.1, Megatron-DeepSpeed PR #374, and Ray 2.32.0. You can find the full Gaudi Support Matrix here.
Lastly, this is a reminder that in subsequent releases, Gaudi Lazy mode execution will be deprecated, and PyTorch Eager mode and torch.compile will be the default execution mode.
You can find more information on the Gaudi Software 1.17.0 release on the Intel Gaudi release notes page.