Transformer models are getting bigger and their training require a large amount of memory. Large models do not always fit into devices’ memory and tools like DeepSpeed can be used on Gaudi to reduce their memory consumption and be able to deploy them in a cost-efficient manner. Join us for a live webinar to learn how to use DeepSpeed to train a 1.6B parameter GPT2-XL on Gaudi.