Home » Get Started » Intel® Gaudi® AI accelerators Model Performance Data » Habana Model Performance Data – 1.2.0

Habana Model Performance Data – 1.2.0

See the latest TensorFlow and PyTorch model performance data. Visit the Habana catalog for information on models and containers that are currently integrated with Habana’s Synapse AI software suite. For more information on future model support, please refer to our SynapseAI roadmap page.

TensorFlow Reference Models Performance

FrameworkModel#HPUPrecisionTime to TrainAccuracyThroughputBatch SizeComments
TensorFlow 2.7.0ResNet50 Keras LARS32bf160h 21m75.9249060 images/sec256
TensorFlow 2.7.0ResNet50 Keras LARS16bf160h 40m75.6924598 images/sec256
TensorFlow 2.7.0ResNet50 Keras LARS8bf161h 9m76.0612880 images/sec256
TensorFlow 2.7.0ResNet50 Keras LARS1bf168h 36m76.031695.58 images/sec256
TensorFlow 2.7.0BERT-Large Pre Training phase 132bf166586.64 sentences/sec64
TensorFlow 2.7.0BERT-Large Pre Training phase 18bf161669.71  sentences/sec64
TensorFlow 2.7.0BERT-Large Pre Training phase 11bf16210.36 sentences/sec64
TensorFlow 2.7.0BERT-Large Pre Training phase 232bf1626h 39m2124.07 sentences/sec8The TTT of 26h and 39m is for Phase 1 and Phase 2
TensorFlow 2.7.0BERT-Large Pre Training phase 28bf16538.94  sentences/sec8
TensorFlow 2.7.0BERT-Large Pre Training phase 21bf1667.98 sentences/sec8
TensorFlow 2.7.0BERT-Large Fine Tuning (SQUAD)8bf160h 14m93.03392.76 sentences/sec24
TensorFlow 2.7.0BERT-Large Fine Tuning (SQUAD)1bf161h 7m93.4753.34 sentences/sec24
TensorFlow 2.7.0SSD ResNet348bf160h 32m22.193620.58 images/sec128
TensorFlow 2.7.0SSD ResNet341bf164h 47m23.68502.29 images/sec128
TensorFlow 2.7.0ResNext1018bf166h 41m79.215002 images/sec128
TensorFlow 2.7.0ResNext1011bf1646h 38m79.26689.77 images/sec128
TensorFlow 2.7.0Unet2D8bf160h 3m88.2392.07 images/sec8
TensorFlow 2.7.0Unet2D1bf160h 18m88.8351.72 images/sec8
TensorFlow 2.7.0Unet3D8bf160h 14m88.2242.75 images/sec2
TensorFlow 2.7.0Unet3D1bf161h 20m89.76.7 images/sec2
TensorFlow 2.7.0Transformer8bf1619h 31m26.6155888 sentences/sec4096
TensorFlow 2.7.0Transformer1bf1617h 48m23.722245 sentences/sec4096
TensorFlow 2.7.0Mask R-CNN8bf164h 21m34.14107.6 images/sec4
TensorFlow 2.7.0Mask R-CNN1bf1625h 28m33.9815.72 images/sec4
TensorFlow 2.7.0VisionTransformer8bf167h 37m84.44442.47 images/sec32
TensorFlow 2.7.0RetinaNet8bf167h 43m38.7179.94 images/sec64
TensorFlow 2.6.2Densenet 121 tf.distribute8bf166h 33m74.696575 images/sec1024
TensorFlow 2.7.0T5 Base1bf160h 20m94.3296.74 sentences/sec16
TensorFlow 2.7.0VGG SegNet1bf160h 10m88.63109.4 images/sec16
TensorFlow 2.7.0MobileNet V21bf161119.96 images/sec96
TensorFlow 2.7.0EfficientDet8fp32152.46 images/sec8
TensorFlow 2.7.0CycleGAN1bf164h 9m15.782
TensorFlow 2.7.0Albert-Large Fine Tuning (SQUAD)8bf160h 25m90.68438.36 sentences/sec32
TensorFlow 2.7.0Albert-Large Fine Tuning (SQUAD)1bf161h 28m91.0355.06 sentences/sec32
TensorFlow 2.7.0ResNet50 Keras LARS tf.distribute8bf161h 10m76.0512799 images/sec256
TensorFlow 2.7.0ResNet50 Keras SGD8bf162h 39m76.1912612 images/sec256
TensorFlow 2.7.0ResNet50 Keras LARS Host NIC16bf1621474 images/sec256using Horovod and libfabric,
using HCCL_OVER_OFI=1
TensorFlow 2.7.0ResNet50 Keras LARS Host NIC16bf1619841 images/sec256using tf.distribute and libfabric,
using HCCL_OVER_OFI=1

PyTorch Reference Models Performance

FrameworkModel#HPUPrecisionTime to TrainAccuracyThroughputBatch SizeComments
PyTorch 1.10.0ResNet5032bf160h 52m74.6640633 images/sec256
PyTorch 1.10.0ResNet5016bf161h 28m75.9125022 images/sec256
PyTorch 1.10.0ResNet508bf162h 47m76.112833 images/sec256
PyTorch 1.10.0ResNet501bf1621h 56m75.881722 images/sec256
PyTorch 1.10.0BERT-L Lazy Mode Pre Training Phase 132bf1627h 46m4920 sentences/sec64ph1 final_loss : 1.494
ph2 final_loss : 1.349
PyTorch 1.10.0BERT-L Lazy Mode Pre Training Phase 18bf161282 sentences/sec64
PyTorch 1.10.0BERT-L Lazy Mode Pre Training Phase 11bf16154 sentences/sec64
PyTorch 1.10.0BERT-L Lazy Mode Pre Training Phase 232bf1614h 18m970 sentences/sec8
PyTorch 1.10.0BERT-L Lazy Mode Pre Training Phase 28bf16256 sentences/sec8
PyTorch 1.10.0BERT-L Lazy Mode Pre Training Phase 21bf1632 sentences/sec8
PyTorch 1.10.0BERT-L Lazy Mode Fine Tuning8bf160h 10m93.17341 sentences/sec24
PyTorch 1.10.0BERT-L Lazy Mode Fine Tuning1bf161h 12m92.9449 sentences/sec24
PyTorch 1.10.0ResNext1018bf166h 39m78.045768 images/sec128
PyTorch 1.10.0ResNext1011bf1648h 47m78.13777.53 images/sec128
PyTorch 1.10.0ResNet1528bf167h 53m78.035191 images/sec128
PyTorch 1.10.0ResNet1521bf1646h 50m77.61729 images/sec128
PyTorch 1.10.0Unet2D8bf161h 8m72.824624.22  images/sec64
PyTorch 1.10.0Unet2D1bf169h 24m72.84609.44 images/sec64
PyTorch 1.10.0Unet3D8bf161h 27m74.1359.77 images/sec2
PyTorch 1.10.0Unet3D1bf1613h 34m74.37.56  images/sec2
PyTorch 1.10.0SSD8bf161h 25m22.931664 images/sec32
PyTorch 1.10.0SSD1bf164h 12m23.07449 images/sec32
PyTorch 1.10.0Transformer8bf1620h 49m28.1150407 sentences/sec4096
PyTorch 1.10.0Transformer1bf1622h 22m21525.8 sentences/sec4096
PyTorch 1.10.0GoogLeNet8bf164h 18m72.4415056 images/sec256
PyTorch 1.10.0GoogLeNet1bf1619h 9m72.311851 images/sec256
PyTorch 1.10.0DistilBERT8bf160h 10m85.49770 sentences/sec8
PyTorch 1.10.0DistilBERT1bf160h 41m85.47149 sentences/sec8
PyTorch 1.10.0RoBERTa Large8bf160h 11m94.53284 sentences/sec12
PyTorch 1.10.0RoBERTa Large1bf161h 30m94.2742.6 sentences/sec12
PyTorch 1.10.0RoBERTa Base8bf160h 4m91.85731 sentences/sec12
PyTorch 1.10.0RoBERTa Base1bf160h 34m92.39128 sentences/sec12
PyTorch 1.10.0ALBERT-XXL Fine Tuning8bf160h 43m94.9174 sentences/sec12
PyTorch 1.10.0ALBERT-XXL Fine Tuning1bf165h 29m94.799 sentences/sec12
PyTorch 1.10.0ALBERT-Large Fine Tuning8bf160h 10m91.9362 sentences/sec32
PyTorch 1.10.0ALBERT-Large Fine Tuning1bf161h 7m93.2544 sentences/sec32
PyTorch 1.10.0MobileNetV21bf161515 images/sec256
PyTorch 1.10.0BART Fine Tuning8bf160h 8m1364 sentences/sec32
PyTorch 1.10.0BART Fine Tuning1bf160h 50m193 sentences/sec32
PyTorch 1.10.0ResNet50 Host NIC16bf162h 5m75.9616311 images/sec256

System Configuration:
HPU: Habana Gaudi® HL-205 Mezzanine cards
System: HLS-1 with eight HL-205 HPU and two Intel® Xeon® Platinum 8280 CPU @ 2.70GHz, and 756GB of System Memory
Software: Ubuntu20.04, SynapseAI Software version 1.2.0-585
Tensorflow: Models run with Tensorflow v2.7.0 use this Docker image; ones with v2.6.2 use this Docker image
PyTorch: Models run with PyTorch v1.10.0 use this Docker image
Environment: These workloads are run using the Docker images running directly on the Host OS


Performance varies by use, configuration and other factors.  All information provided here is subject to change without notice. Habana Labs may make changes to its test conditions and internal reliability goals at any time.  Contact your Habana Labs representative to obtain the latest Habana Labs product specifications and roadmaps. Your costs and results may vary.

Stay Informed: Register for the latest Intel Gaudi AI Accelerator developer news, events, training, and updates.