Transformer models are getting bigger and their training require a large amount of memory. Large models do not always fit into devices’ memory and tools like DeepSpeed can be used on Gaudi to reduce their memory consumption and be able to deploy them in a cost-efficient manner. Join us for a live webinar to learn how to use DeepSpeed to train a 1.6B parameter GPT2-XL on Gaudi.
Presenters:

Regis Pierrard
Machine Learning Engineer at Hugging Face, and the core maintainer of Optimum Habana
