Home » DeepSpeed

Intel® Gaudi® AI Accelerators Blog

/ DeepSpeed
With the Intel Gaudi SynapseAI 1.13.0 release, users can run Fine Tune the Llama2 70B model using only 8 Gaudi2 Accelerators.
One of the main challenges in training Large Language Models (LLMs) is that they are often too large to fit on a single node or even if they fit, the training may be too slow. To address this issue, their training can be parallelized across multiple Gaudi accelerators (HPUs).
If you want to train a large model using Megatron-DeepSpeed, but the model you want is not included in the implementation, you can port it to the Megatron-DeepSpeed package. Assuming your model is transformer-based, you can add your implementation easily, basing it on existing code.
We have optimized additional Large Language Models on Hugging Face using the Optimum Habana library.
With Habana’s SynapseAI 1.8.0 release support of DeepSpeed Inference, users can run inference on large language models, including BLOOM 176B.
In this post, we show you how to run Habana’s DeepSpeed enabled BERT1.5B model from our Model-References repository.
In this tutorial, we will demonstrate fine tuning a GPT2 model on Habana Gaudi AI processors using Hugging Face optimum-habana library with DeepSpeed.
One of the key challenges in Large Language Model (LLM) training is reducing the memory requirements needed for training without sacrificing compute/communication efficiency and model accuracy.
Stay Informed: Register for the latest Intel Gaudi AI Accelerator developer news, events, training, and updates.