Home » Get Started » Intel® Gaudi® AI accelerators Model Performance Data

Intel® Gaudi® AI accelerators Model Performance Data

These performance numbers are measured using the latest Intel Gaudi SW release, version 1.16.0-526, unless otherwise noted.

All Models for both Training and Inference are using the PyTorch 2.2.2 Framework. Other applicable frameworks used for training or inference are noted for each model.

Training Performance Highlights

Megatron DeepSpeed 0.12.4 | LLaMA2 70B-1,024 BS=4096 | LLaMA2 70B-512 BS=2048 | LLaMA2 70B-256 BS=1024 

Intel Gaudi 2 MLPerf™ 3.1 Training Performance

These performance numbers have been generated with previous versions of Intel Gaudi SW.  They plan to be updated with the upcoming release of new MLPerf Training in the next Intel Gaudi Software release.

* The GPT3 measurement with 384 cards was taken using a pre-launch version of the Intel Gaudi 1.13.0 Software stack
** The GPT measurement with 256 cards and Stable Diffusion were taken using the Intel Gaudi 1.13.0 Software stack
*** The Resnet and BERT measurement were taken using the Intel Gaudi 1.15.0 Software stack

Intel Gaudi 2 Large Language Models Training Performance 

TP, PP, DP = These are the Tensor Parallel, Pipeline Parallel and Data Parallel parameters for the Megatron DeepSpeed training

Intel Gaudi 2 Reference Models Training Performance

Hugging Face Optimum Habana for Intel Gaudi 2 Training Performance

See the Examples page for information on how to run each of the Tasks, including model naming and hyperparameter usage.

MosaicML on Intel Gaudi 2 Training Performance

First Gen Gaudi Reference Models Training Performance

Hugging Face Optimum Habana for First Gen Intel Gaudi Training Performance

See the Examples page for information on how to run each of the Tasks, including model naming and hyperparameter usage.

Intel® Gaudi® 2 with MLPerf* v4.0

Intel Gaudi 2 Large Languages Models for Thruput

Intel Gaudi 2 Large Languages Models for Low Latency

Intel Gaudi 2 Reference Models Inference Performance

Hugging Face Optimum for Intel Gaudi 2 Inference Performance 

See the Examples page for information on how to run each of the Tasks, including model naming and hyperparameter usage.

Intel Gaudi First Gen Reference Models Inference Performance

Hugging Face Optimum Habana on Intel Gaudi First Gen Performance

See the Examples page for information on how to run each of the Tasks, including model naming and hyperparameter usage.

* These models used the previous 1.15.0 software release
*** For the Large Language Inference Models, this is the average next token latency

System Configuration:

Gaudi® Platform
System: HLS-1 with eight Habana Gaudi HL-205 Mezzanine cards and two Intel® Xeon® Platinum 8280 CPU @ 2.70GHz, and 756GB of System Memory

Gaudi®2 Platform
System: HLS-Gaudi2 with eight Habana Gaudi2 HL-225H Mezzanine cards and two Intel® Xeon® Platinum 8380 CPU @ 2.30GHz, and 1TB of System Memory

Common Software
Ubuntu22.04, SynapseAI Software version 1.16.0-526
PyTorch: Models run with PyTorch v2.2.2 use this Docker image
Environment: These workloads are run using the Docker images running directly on the Host OS


Performance varies by use, configuration and other factors.  Please refer to the Model-References GitHub page for each model’s support and validation coverage.  All information provided here is subject to change without notice. Habana Labs may make changes to its test conditions and internal reliability goals at any time.  Contact your Habana Labs representative to obtain the latest Habana Labs product specifications and roadmaps. Your costs and results may vary.

Stay Informed: Register for the latest Intel Gaudi AI Accelerator developer news, events, training, and updates.