Home » Resources » Habana Training Models and Performance

Habana Training Models and Performance

Get access to Habana’s popular frameworks and optimized models that enable you to quickly and easily build, train and deploy your Gaudi models. For more information on the future model support, please refer to our SynapseAI roadmap page.

See the latest TensorFlow and PyTorch model performance.

Computer Vision

ResNet50, 101, 152

TensorFlow

ResNeXt101

TensorFlow

SSD-ResNet34

TensorFlow

UNet2D

TensorFlow

Mask R-CNN

TensorFlow

DenseNet Keras

TensorFlow

ResNet50, ResNeXt101

PyTorch

Natural Language Processing

Bert

TensorFlow

Albert

TensorFlow

Transformer

TensorFlow

Bert

PyTorch

Recommendation Systems

DLRM

PyTorch

TensorFlow Reference Models Performance

All measurements below are for Mixed Precision mode.

FrameworkModel# HPUTime to TrainAccuracyThroughputBatch Size
TensorFlow 2.5.0 ResNet50 Keras LARS18h45m75.911690 images/sec256
TensorFlow 2.5.0 ResNet50 Keras LARS81h15m75.9312900 images/sec256
TensorFlow 2.4.1ResNet50 Keras LARS (tfdistribute)81h 12min75.9912300 images/sec256
TensorFlow 2.4.1ResNet50 Keras LARS1675.9423231 images/sec256
TensorFlow 2.4.1ResNet50 Keras LARS3275.1446000 images/sec256
TensorFlow 2.5.0ResNet50 Keras SGD119h-20m76.061690 images/sec256
TensorFlow 2.5.0ResNet50 Keras SGD83h76.1812400 images/sec256
TensorFlow 2.5.0BERT-Large Fine Tuning (SQUAD)11h2092.9554 sentences/sec24
TensorFlow 2.5.0BERT-Large Fine Tuning (SQUAD)820m93.2300 sentences/sec24
TensorFlow 2.5.0*BERT-Large Pre Training1Phase 1-166 sentences/sec
Phase 2-30 sentences/sec
Phase 1-64
Phase 2-8
TensorFlow 2.5.0*BERT-Large Pre Training8Phase 1-1310 sentences/sec
Phase 2-246 sentences/sec
Phase 1-64
Phase 2-8
TensorFlow 2.5.0*BERT-Large Pre Training3239hPhase 1-5152 sentences/sec
Phase 2-980 sentences/sec
Phase 1-64
Phase 2-8
TensorFlow 2.4.1Mask R-CNN136h34.1412 images/sec4
TensorFlow 2.4.1Mask R-CNN87h34.1776 images/sec4
TensorFlow 2.5.0Unet2D11h5088.7449 images/sec8
TensorFlow 2.5.0Unet2D841m88.4373 images/sec8
TensorFlow 2.5.0ResNext101148h79.19650 images/sec128
TensorFlow 2.5.0ResNext10186h45m79.194510 images/sec128
TensorFlow 2.5.0SSD ResNet3413h55m22.98475 images/sec128
TensorFlow 2.5.0SSD ResNet34845m22.243455 images/sec128
TensorFlow 2.5.0**Transformer117h21.4187604096
TensorFlow 2.5.0**Transformer822h30m261385504096
TensorFlow 2.4.1DenseNet10.712836 images/sec128
TensorFlow 2.5.0Albert-Large Fine Tuning (SQUAD)145 sentences/sec32
TensorFlow 2.5.0Albert-Large Fine Tuning (SQUAD)8F1 90.9
EM 84.1
358 sentences/sec32
TensorFlow 2.5.0Albert-Large Pre Training1142 sentences/sec64

* With accumulation steps
** Evaluation graph in Transformer is run on CPU and impacts TTT performance

PyTorch Reference Models Performance

All measurements below are for Mixed Precision mode.

FrameworkModel# HPUTime to TrainAccuracyThroughputBatch Size
PyTorch 1.7.1ResNext101177.95730 images/sec128
PyTorch 1.7.1ResNext101815h 1min78.32860 images/sec128
PyTorch 1.7.1ResNet50176.081590 images/sec256
PyTorch 1.7.1*ResNet5089h 30 min76.224670 images/sec256
PyTorch 1.7.1*ResNet50166h76.137900 images/sec256
PyTorch 1.7.1BERT-Large Fine Tuning (SQUAD)
(Lazy Mode)
11h 12min93.0945 sentences/sec24
PyTorch 1.7.1BERT-Large Fine Tuning (SQUAD)
(Lazy Mode)
820min93.05303s23
PyTorch 1.7.1BERT-Large Pre Training (Lazy Mode)1Phase 1 – 123 sentences/sec
Phase 2 – 23 sentences/sec
64
PyTorch 1.7.1BERT-Large Pre Training (Lazy Mode)8Phase 1 – 950 sentences/sec
Phase 2 – 176 sentences/sec
64
PyTorch 1.7.1BERT-Large Pre Training
(Graph Mode)
1Phase 1 – 142 sentences/sec
Phase 2 – 27 sentences/sec
64
PyTorch 1.7.1BERT-Large Pre Training
(Graph Mode)
8Phase 1 -1130 sentences/sec
Phase 2 – 210 sentences/sec
64

* PyTorch  dataloader consumes a significant portion of the training time, impacting overall model performance.


System Configuration:
HPU: Habana Gaudi® HL-205 Mezzanine cards
System: HLS-1 with eight HL-205 HPU and two Intel® Xeon® Platinum 8280 CPU @ 2.70GHz, and 756GB of System Memory
Software: Ubuntu20.04, SynapseAI 0.15.1-37, using TensorFlow v2.5.0 and PyTorch v1.7.1
TensorFlow docker image here.
PyTorch Docker image here.


Performance varies by use, configuration and other factors.  All information provided here is subject to change without notice. Habana Labs may make changes to its test conditions and internal reliability goals at any time.  Contact your Habana Labs representative to obtain the latest Habana Labs product specifications and roadmaps. Your costs and results may vary.

Sign up for the latest Habana developer news, events, training, and updates.