One of the main challenges in training Large Language Models (LLMs) is that they are often too large to fit on a single node or even if they fit, the training may be too slow. To address this issue, their training can be parallelized across multiple Gaudi accelerators (HPUs).
If you want to train a large model using Megatron-DeepSpeed, but the model you want is not included in the implementation, you can port it to the Megatron-DeepSpeed package. Assuming your model is transformer-based, you can add your implementation easily, basing it on existing code.
We have optimized additional Large Language Models on Hugging Face using the Optimum Habana library.
In this tutorial, we will demonstrate fine tuning a GPT2 model on Habana Gaudi AI processors using Hugging Face optimum-habana library with DeepSpeed.
One of the key challenges in Large Language Model (LLM) training is reducing the memory requirements needed for training without sacrificing compute/communication efficiency and model accuracy.