Home » Get Started » Intel® Gaudi® AI accelerators Model Performance Data » Habana Model Performance Data – 1.1.0

Habana Model Performance Data – 1.1.0

See the latest TensorFlow and PyTorch model performance data. Visit the Habana catalog for information on models and containers that are currently integrated with Habana’s Synapse AI software suite. For more information on future model support, please refer to our SynapseAI roadmap page.

TensorFlow Reference Models Performance

FrameworkModel#HPUPrecisionTime to TrainAccuracyThroughputBatch SizeComments
TensorFlow 2.6.0ResNet50 Keras SGD8Mixed2h 37m76.1412810 images/sec256 
TensorFlow 2.5.1ResNet50 Keras SGD8Mixed2h 35m75.9812943 images/sec256 
TensorFlow 2.6.0ResNet50 Keras LARS tf.distribute​8Mixed1h 10m76.0413175 images/sec256 
TensorFlow 2.5.1ResNet50 Keras LARS tf.distribute​8Mixed1h 10m76.0413205 images/sec256 
TensorFlow 2.6.0ResNet50 Keras LARS​32Mixed0h 23m75.7748710 images/sec256 
TensorFlow 2.6.0ResNet50 Keras LARS​16Mixed0h 40m75.4624567 images/sec256 
TensorFlow 2.6.0ResNet50 Keras LARS​8Mixed1h 8m76.0913183 images/sec256 
TensorFlow 2.6.0ResNet50 Keras LARS​1Mixed8h 24m76.121731 images/sec256 
TensorFlow 2.6.0ResNext1018Mixed6h 33m79.275066 images/sec128 
TensorFlow 2.6.0ResNext1011Mixed45h 37m79.2697 images/sec128 
TensorFlow 2.6.0SSD ResNet348Mixed0h 29m22.453637 images/sec128 
TensorFlow 2.6.0SSD ResNet341Mixed3h 16m22.77506 images/sec128 
TensorFlow 2.6.0Mask R-CNN8Mixed3h 45m34.09104 images/sec4 
TensorFlow 2.6.0Mask R-CNN1Mixed25h 33m34.115 images/sec4 
TensorFlow 2.6.0Unet2D8Mixed0h 3m88.05371 images/sec8Results reported for single fold training time
TensorFlow 2.6.0Unet2D1Mixed0h 18m88.8950 images/sec8Results reported for single fold training time
TensorFlow 2.6.0Unet3D8Mixed0h 15m89.1339 images/sec2 
TensorFlow 2.6.0Unet3D1Mixed1h 26m89.936 images/sec2 
TensorFlow 2.5.1Densenet​ 121 tf.distribute​8Mixed5h 13m73.966575 images/sec2048 
TensorFlow 2.5.1VGG SegNet1Mixed0h 9m89.61102 images/sec16 
TensorFlow 2.6.0RetinaNet1fp327h 11m27.5712 images/sec8 
TensorFlow 2.6.0MobileNet V21Mixed  1135 images/sec96 
TensorFlow 2.6.0EfficientDet​8fp3290h 19m33.51157 images/sec8 
TensorFlow 2.5.1CycleGAN1Mixed5h 13m 122 
TensorFlow 2.5.1Transformer8Mixed17h 24m26.41574654096 
TensorFlow 2.5.1Transformer1Mixed23.7221454096 
TensorFlow 2.5.1T5 Base1Mixed0h 16m94.58109 sentences/sec16 
TensorFlow 2.6.0BERT-Large Fine Tuning (SQUAD) 8Mixed0h 14m93.44391 sentences/sec24 
TensorFlow 2.6.0BERT-Large Fine Tuning (SQUAD) 1Mixed1h 7m93.5653 sentences/sec24 
TensorFlow 2.6.0BERT-Large Pre Training32Mixed36h 55mPhase 1:  Loss 1.3
Phase 2:  Loss 0.86
Phase 1:  5527  sentences/sec
Phase 2:  1066  sentences/sec
Phase 1:  – 64
Phase 2:  – 8
With accumulation steps
TensorFlow 2.6.0BERT-Large Pre Training8Mixed  Phase 1:  1404 sentences/sec
Phase 2:  271 sentences/sec
Phase 1:  – 64
Phase 2:  – 8
With accumulation steps
TensorFlow 2.5.1Albert-Large Fine Tuning (SQUAD)8Mixed0h 23mF1 91
EM 84
442 sentences/sec32Time to train does not include tokenization
TensorFlow 2.5.1Albert-Large Fine Tuning (SQUAD)1Mixed1h 11m 54 sentences/sec32Time to train does not include tokenization
TensorFlow 2.5.1Albert-Large Pre Training1Mixed  Phase 1: 176 sentences/sec
Phase 2: 37 sentences/sec
Phase 1 – 64
Phase 2 – 8
 

PyTorch Reference Models Performance

FrameworkModel#HPUPrecisionTime to TrainAccuracyThroughputBatch SizeComments
PyTorch 1.9.1ResNet5016Mixed2h 56m75.7722236 images/sec256
PyTorch 1.9.1ResNet50 Host NIC16Mixed4h 0m75.829634 images/sec256
PyTorch 1.9.1ResNet508Mixed2h 43m75.9612752 images/sec256
PyTorch 1.9.1ResNet1528Mixed7h 51m78.074927 images/sec128
PyTorch 1.9.1ResNext1018Mixed6h 54m78.146053 images/sec128
PyTorch 1.9.1Unet2D8Mixed1h 24m72.744531 images/sec64
PyTorch 1.9.1DLRM1Mixed 48312 queries/sec512Uses Random Input Distribution
PyTorch 1.9.1Transformer8Mixed25h 1m27.61276314096
PyTorch 1.9.1RoBERTa Large8Mixed0h 12m94.7259 sentences/sec12
PyTorch 1.9.1RoBERTa Large1Mixed1h 17m94.6138 sentences/sec12
PyTorch 1.9.1RoBERTa Base8Mixed0h 5m91.67640 sentences/sec12
PyTorch 1.9.1RoBERTa Base1Mixed0h 30m92.54102 sentences/sec12
PyTorch 1.9.1DistilBERT8Mixed0h 13m85.24503 sentences/sec8
PyTorch 1.9.1DistilBERT1Mixed0h 42m85.56136 sentences/sec8
PyTorch 1.9.1BERT-Large Fine Tuning Lazy Mode (SQUAD)8Mixed0h 10m93.13
F1 Score: 91.35%
318 sentences/sec24
PyTorch 1.9.1BERT-Large Fine Tuning Lazy Mode (SQUAD)1Mixed1h 8m93.4443 sentences/sec24
PyTorch 1.9.1BERT-Large Pre Training
Lazy Mode
32Mixed40h 13mPhase 1:  Loss 1.3
Phase 2:  Loss 1.34
Phase 1:  4124  sentences/sec
Phase 2:  635  sentences/sec
Phase 1:  – 64
PyTorch 1.9.1BERT-Large Pre Training
Lazy Mode
8Mixed Phase 1:  1290 sentences/sec
Phase 2:  259 sentences/sec
Phase 1:  – 64

System Configuration:
HPU: Habana Gaudi® HL-205 Mezzanine cards
System: HLS-1 with eight HL-205 HPU and two Intel® Xeon® Platinum 8280 CPU @ 2.70GHz, and 756GB of System Memory
Software: Ubuntu20.04, SynapseAI Software version 1.1.0-614
Tensorflow: Models run with Tensorflow v2.5.1 use this Docker image; ones with v2.6.0 use this Docker image
PyTorch: Models run with PyTorch v1.9.1 use this Docker image
Environment: These workloads are run using the Docker images running directly on the Host OS


Performance varies by use, configuration and other factors.  All information provided here is subject to change without notice. Habana Labs may make changes to its test conditions and internal reliability goals at any time.  Contact your Habana Labs representative to obtain the latest Habana Labs product specifications and roadmaps. Your costs and results may vary.

Stay Informed: Register for the latest Intel Gaudi AI Accelerator developer news, events, training, and updates.