In training workloads, there may occur some scenarios in which graph re-compilations occur. This can create system latency and slow down the overall training process with multiple iterations of graph compilation. This blog focuses on detecting these graph re-compilations.
In this tutorial we will learn how to write code that automatically detects what type of AI accelerator is installed on the machine (Gaudi, GPU or CPU), and make the needed changes to run the code smoothly.
In this tutorial, we will demonstrate fine tuning a GPT2 model on Habana Gaudi AI processors using Hugging Face optimum-habana library with DeepSpeed.
One of the key challenges in Large Language Model (LLM) training is reducing the memory requirements needed for training without sacrificing compute/communication efficiency and model accuracy.