Home » Get Started » Intel® Gaudi® AI accelerators Model Performance Data » Habana Training Models and Performance-0.15.4

Habana Training Models and Performance-0.15.4

See the latest TensorFlow and PyTorch model performance data. Visit the Habana catalog for information on models and containers that are currently integrated with Habana’s Synapse AI software suite. For more information on future model support, please refer to our SynapseAI roadmap page.

TensorFlow Reference Models Performance

Framework Model# HPUTime to TrainAccuracyThroughputBatch Size
TensorFlow 2.5.0ResNet50 Keras LARS18h45m75.911690 images/sec256
TensorFlow 2.5.0ResNet50 Keras LARS81h15m75.9312900 images/sec256
TensorFlow 2.4.1ResNet50 Keras LARS16N/A75.9423231 images/sec256
TensorFlow 2.4.1ResNet50 Keras LARS32N/A75.1446000 images/sec256
TensorFlow 2.5.0ResNet50 Keras SGD119h-20m76.061690 images/sec256
TensorFlow 2.5.0ResNet50 Keras SGD83h76.1812400 images/sec256
TensorFlow 2.4.1ResNet50 Keras LARS (tfdistribute)8 75.9910600 images/sec256
TensorFlow 2.5.0BERT-Large Fine Tuning (SQUAD)11h2092.9554 sentences/sec24
TensorFlow 2.5.0BERT-Large Fine Tuning (SQUAD)820m93.2300 sentences/sec24
TensorFlow 2.5.0*BERT-Large Pre Training1N/AN/APhase 1 166 sentences/sec
Phase 2 30 sentences/sec
Phase 1 64
Phase 2 8
TensorFlow 2.5.0*BERT-Large Pre Training8N/A Phase 1 1310sentences/sec
Phase 2 246 sentences/sec
Phase 1 64
Phase 2 8
TensorFlow 2.5.0*BERT-Large Pre Training3239h Phase 1 5152 sentences/sec
Phase 2 980 sentences/sec
Phase 1 64
Phase 2 8
TensorFlow 2.4.1Mask R-CNN136h34.1412 images/sec4
TensorFlow 2.4.1Mask R-CNN87h34.1776 images/sec4
TensorFlow 2.5.0Unet2D11h5088.7449 images/sec8
TensorFlow 2.5.0Unet2D841m88.4373 images/sec8
TensorFlow 2.5.0ResNext101148h79.19650 images/sec128
TensorFlow 2.5.0ResNext10186h45m79.194510 images/sec128
TensorFlow 2.5.0SSD ResNet3413h55m22.98475 images/sec128
TensorFlow 2.5.0SSD ResNet34845m22.243455 images/sec128
TensorFlow 2.5.0**Transformer117h21.4187604096
TensorFlow 2.5.0**Transformer822h30m261385504096
TensorFlow 2.4.1DenseNet1N/A0.712836 images/sec128
TensorFlow 2.5.0ALBERT-Large Fine Tuning (SQUAD)1  45 sentences/sec32
TensorFlow 2.5.0ALBERT-Large Fine Tuning (SQUAD)8 F1 90.9
EM 84.1
358 sentences/sec32
TensorFlow 2.5.0ALBERT-Large Pre Training1  142 sentences/sec64

* With accumulation steps
** Evaluation graph in Transformer is run on CPU and may impact TTT performance.

PyTorch Reference Models Performance

FrameworkModel# HPUTime to TrainAccuracyThroughputBatch Size
PyTorch 1.7.1ResNext1011N/A77.95730 images/sec128
PyTorch 1.7.1ResNext101815h 1min78.32860 images/sec128
PyTorch 1.7.1ResNet501N/A76.081330 images/sec256
PyTorch 1.7.1*ResNet5089h 30 min76.225700 images/sec256
PyTorch 1.7.1*ResNet50166h76.136657 images/sec256
PyTorch 1.7.1BERT-Large Fine Tuning (SQUAD) 
Lazy Mode
11h 12min93.0945 sentences/sec24
PyTorch 1.7.1BERT-Large Fine Tuning (SQUAD)
Lazy Mode
820 min93.05303 sentences/sec24
PyTorch 1.7.1BERT-Large Pre Training
Lazy Mode
1N/A Phase 1 123 sentences/sec
Phase 2 23 sentences/sec
64
PyTorch 1.7.1BERT-Large Pre Training
Lazy Mode
8N/A Phase 1 950 sentences/sec
Phase 2 176 sentences/sec
64
PyTorch 1.7.1BERT-Large Pre Training
Graph Mode
1N/A Phase 1 128sentences/sec
Phase 2 24 sentences/sec
64
PyTorch 1.7.1BERT-Large Pre Training
Graph Mode
8N/A Phase 1 1016 sentences/sec
Phase 2 190 sentences/sec
64

* PyTorch dataloader consumes a significant portion of the training time, improving overall model performances.


System Configuration:
HPU: Habana Gaudi® HL-205 Mezzanine cards
System: HLS-1 with eight HL-205 HPU and two Intel® Xeon® Platinum 8280 CPU @ 2.70GHz, and 756GB of System Memory
Software: Ubuntu20.04, SynapseAI Software version 1.0.0-532
Tensorflow: Models run with Tensorflow v2.5.0 use this Docker image;
PyTorch: Models run with PyTorch v1.8.1 use this Docker image
Environment: These workloads are run using the Docker images running directly on the Host OS


Performance varies by use, configuration and other factors.  All information provided here is subject to change without notice. Habana Labs may make changes to its test conditions and internal reliability goals at any time.  Contact your Habana Labs representative to obtain the latest Habana Labs product specifications and roadmaps. Your costs and results may vary.

Stay Informed: Register for the latest Intel Gaudi AI Accelerator developer news, events, training, and updates.