Home » Resources » Large Scale Models

Large Scale Models

Large Scale Models including large language and foundation models are driving a paradigm shift in deep learning.

These models are now replacing the task-specific models that have dominated the deep learning landscape to date. They are trained on a broad set of unlabeled data at scale and can be adapted for different tasks with minimal fine-tuning or prompting based techniques. These are also called foundation models, a term first popularized by the Stanford Institute for Human-Centered Artificial Intelligence.

As model parameters get ever larger, special techniques are needed to ensure that models can fit in accelerator memory and train models at scale. Habana is using the DeepSpeed library to accelerate large scale model training. 

Docs

DeepSpeed User Guide

Get started using DeepSpeed on Habana

Getting Started with DeepSpeed

Simple steps for preparing a DeepSpeed model to run on Gaudi

Reference Models

Hugging Face BLOOM

Run inference on the family of BLOOM models with up to 176 billion parameters , developed and trained by Hugging Face. 

Megatron-DeepSpeed BLOOM 13B

Habana’s implementation of the GPT based model BLOOM 13B in the Megatron-DeepSpeed repository

DeepSpeed BERT Models

Habana’s implementation of BERT with 1.5 billion and 5 billion parameters using DeepSpeed

Hugging Face DeepSpeed Models

Hugging Face habana-optimum library includes support for DeepSpeed

Tutorials

Large Model usage with minGPT

This tutorial provides example training scripts to demonstrate different DeepSpeed optimization technologies on HPU. This tutorial will focus on the memory optimization technologies, including Zero Redundancy Optimizer(ZeRO) and Activation Checkpointing.

Finetune GPT2-XL 1.6 billion parameter model with Optimum Habana

This notebook shows how to fine-tune GPT2-XL for causal language modeling with Optimum Habana.

presentation-boardWebinars

Getting Started with Habana: Deep Speed Optimization on Large Models

As we see models getting larger and larger, there is a need to enable libraries and techniques to help reduce the memory size to ensure models will fit into device memory. In this webinar, you’ll learn about the basic steps needed to enable DeepSpeed on Gaudi, and show how the ZeRO1 and ZeRO2 memory optimizers and Activation Checkpointing can be used to reduce memory usage on a large model.

Leverage DeepSpeed to train faster and cheaper large scale transformer models with Hugging Face and Habana Gaudi

Transformer models are getting bigger and their training require a large amount of memory. Large models do not always fit into devices’ memory and tools like DeepSpeed can be used on Gaudi to reduce their memory consumption and be able to deploy them in a cost-efficient manner.

Technical Blogs

BLOOM 176B Inference on Habana Gaudi2 – With Habana’s SynapseAI 1.8.0 release support of DeepSpeed Inference, users can run inference on large language models, including BLOOM 176B.

Memory-Efficient Training on Habana Gaudi with DeepSpeed which discusses how ZeRO (Zero Redundancy Optimizer), a memory-efficient approach enables efficient distributed training of large memory models.

Training Causal Language Models on SDSC’s Gaudi-based Voyager Supercomputing Cluster

Sign up for the latest Habana developer news, events, training, and updates.