Home » Resources » Large Scale Models

Large Scale Models

Large Scale Models including large language and foundation models are driving a paradigm shift in deep learning.

These models are now replacing the task-specific models that have dominated the deep learning landscape to date. They are trained on a broad set of unlabeled data at scale and can be adapted for different tasks with minimal fine-tuning or prompting based techniques. These are also called foundation models, a term first popularized by the Stanford Institute for Human-Centered Artificial Intelligence.

As model parameters get ever larger, special techniques are needed to ensure that models can fit in accelerator memory and train models at scale. Habana is using the DeepSpeed library to accelerate large scale model training. 

Docs

DeepSpeed User Guide

Get started using DeepSpeed on Habana

Getting Started with DeepSpeed

Simple steps for preparing a DeepSpeed model to run on Gaudi

Reference Models

DeepSpeed BERT Models

Habana’s implementation of BERT with 1.5 billion and 5 billion parameters using DeepSpeed

Hugging Face DeepSpeed Models

Hugging Face habana-optimum library includes support for DeepSpeed

Technical Blogs

Memory-Efficient Training on Habana Gaudi with DeepSpeed which discusses how ZeRO (Zero Redundancy Optimizer), a memory-efficient approach enables efficient distributed training of large memory models.

Sign up for the latest Habana developer news, events, training, and updates.