Generative AI and Large Language Models (LLM), also known as foundation models, are driving a paradigm shift in deep learning. They are now replacing the task-specific models that have dominated the deep learning landscape to date. They are trained on a broad set of unlabeled data at scale and can be adapted for different tasks with minimal fine-tuning or prompting based techniques.
As the number of model parameters get ever larger, special techniques are needed to ensure that models can fit in accelerator memory and train models at scale. Habana’s SynapseAI software suite is integrated with the DeepSpeed library to accelerate large scale model training.
Demo Videos
BLOOMZ on Intel AI Cloud
Reference Models
Diffusion Models
Large Language Models (LLMs)
Tutorials
Porting a model to Megatron-DeepSpeed with Intel® Gaudi® AI accelerators
If you want to train a large model using Megatron-DeepSpeed, but the model you want is not included in the implementation, you can port it to the Megatron-DeepSpeed package. Assuming your model is transformer-based, you can add your implementation easily, basing it on existing code.
Enabling DeepSpeed on Intel® Gaudi® AI accelerators
This tutorial provides example training scripts to demonstrate different DeepSpeed optimization technologies on HPU. This tutorial will focus on the memory optimization technologies, including Zero Redundancy Optimizer(ZeRO) and Activation Checkpointing.
Finetune GPT2-XL 1.6 billion parameter model with Optimum Habana
This notebook shows how to fine-tune GPT2-XL for causal language modeling with Optimum Habana.
Webinars
Getting Started with Habana: Deep Speed Optimization on Large Models
As we see models getting larger and larger, there is a need to enable libraries and techniques to help reduce the memory size to ensure models will fit into device memory. In this webinar, you’ll learn about the basic steps needed to enable DeepSpeed on Gaudi, and show how the ZeRO1 and ZeRO2 memory optimizers and Activation Checkpointing can be used to reduce memory usage on a large model.
Leverage DeepSpeed to train faster and cheaper large scale transformer models with Hugging Face and Habana Gaudi
Transformer models are getting bigger and their training require a large amount of memory. Large models do not always fit into devices’ memory and tools like DeepSpeed can be used on Gaudi to reduce their memory consumption and be able to deploy them in a cost-efficient manner.
Technical Blogs
Accelerating Vision-Language Models: BridgeTower on Habana Gaudi2 – Optimum Habana v1.7 on Habana Gaudi2 achieves x2.5 speedups compared to A100 and x1.4 compared to H100 when fine-tuning BridgeTower, a state-of-the-art vision-language model. This performance improvement relies on hardware-accelerated data loading to make the most of your devices.
Accelerate Llama 2 with Intel AI Hardware and Software Optimizations
Training Llama and Bloom 13 Billion Parameter LLMs with 3D Parallelism on Habana® Gaudi2®
Fast Inference on Large Language Models: BLOOMZ on Habana Gaudi2 Accelerator – Learn how to easily deploy multi-billion parameter language models on Habana Gaudi2.
BLOOM 176B Inference on Habana Gaudi2 – With Habana’s SynapseAI 1.8.0 release support of DeepSpeed Inference, users can run inference on large language models, including BLOOM 176B.
Memory-Efficient Training on Habana Gaudi with DeepSpeed which discusses how ZeRO (Zero Redundancy Optimizer), a memory-efficient approach enables efficient distributed training of large memory models.
Training Causal Language Models on SDSC’s Gaudi-based Voyager Supercomputing Cluster
Docs
Getting Started with DeepSpeed
Simple steps for preparing a DeepSpeed model to run on Gaudi
DeepSpeed User Guide
Get started using DeepSpeed on Habana