Home » Resources » Habana Model Performance Data

Habana Model Performance Data

See the latest performance data for Gaudi2 training, Gaudi2 inference, Gaudi training and Gaudi inference. For information on models and containers that are currently integrated with Habana’s Synapse AI software suite visit the Habana catalog.

These performance numbers are measured using the latest SynapseAI SW release, version 1.14.0-493, unless otherwise noted.

All Models for both Training and Inference are using the PyTorch 2.1.1 Framework. Other applicable frameworks used for training or inference are noted for each model.

Training Performance Highlights

Megatron DeepSpeed 0.12.4 | LLaMA2 70B-1,024 BS=4096 | LLaMA2 70B-512 BS=2048 | LLaMA2 70B-256 BS=1024 

Gaudi2 MLPerf™ 3.1 Training Performance

These performance numbers have been generated with the latest version of SynapseAI and are improvements over the officially submitted numbers posted on MLCommons website.

* The GPT3 measurement with 384 cards was taken using a pre-launch version of the SynapseAI 1.13.0 Software stack
** The GPT measurement with 256 cards and Stable Diffusion were taken using the SynapseAI 1.13.0 Software stack

Gaudi2 Large Language Models Training Performance 

TP, PP, DP = These are the Tensor Parallel, Pipeline Parallel and Data Parallel parameters for the Megatron DeepSpeed training

Gaudi2 Reference Models Training Performance

Hugging Face Optimum Habana Gaudi2 Training Performance

See the Examples page for information on how to run each of the Tasks, including model naming and hyperparameter usage.

MosaicML Gaudi2 Training Performance

Gaudi Reference Models Training Performance

Hugging Face Optimum Habana Gaudi Training Performance

See the Examples page for information on how to run each of the Tasks, including model naming and hyperparameter usage.

Gaudi2 MLPerf™ 3.1 Inference Performance

Gaudi2 Large Languages Models Inference Performance

Gaudi2 Reference Models Inference Performance

Hugging Face Optimum Habana Gaudi2 Inference Performance 

See the Examples page for information on how to run each of the Tasks, including model naming and hyperparameter usage.

Gaudi Reference Models Inference Performance

Hugging Face Optimum Habana Gaudi Inference Performance 

See the Examples page for information on how to run each of the Tasks, including model naming and hyperparameter usage.

*** For the Large Language Models, this is the average next token latency

System Configuration:

Gaudi® Platform
System: HLS-1 with eight Habana Gaudi HL-205 Mezzanine cards and two Intel® Xeon® Platinum 8280 CPU @ 2.70GHz, and 756GB of System Memory

Gaudi®2 Platform
System: HLS-Gaudi2 with eight Habana Gaudi2 HL-225H Mezzanine cards and two Intel® Xeon® Platinum 8380 CPU @ 2.30GHz, and 1TB of System Memory

Amazon EC2 DL1 Instance
System: Custom Server with eight Habana Gaudi HL-205 Mezzanine cards and two Intel® Xeon® Platinum 8275CL CPU @ 3.00GHz, and 756GB of System Memory

Common Software
Ubuntu22.04, SynapseAI Software version 1.14.0-493
PyTorch: Models run with PyTorch v2.1.1 use this Docker image
Environment: These workloads are run using the Docker images running directly on the Host OS


Performance varies by use, configuration and other factors.  Please refer to the Model-References GitHub page for each model’s support and validation coverage.  All information provided here is subject to change without notice. Habana Labs may make changes to its test conditions and internal reliability goals at any time.  Contact your Habana Labs representative to obtain the latest Habana Labs product specifications and roadmaps. Your costs and results may vary.

Sign up for the latest Habana developer news, events, training, and updates.