See the latest TensorFlow and PyTorch model performance data. Visit the Habana catalog for information on models and containers that are currently integrated with Habana’s Synapse AI software suite. For more information on future model support, please refer to our SynapseAI roadmap page.
TensorFlow Reference Models Performance
Framework | Model | #HPU | Precision | Time to Train | Accuracy | Throughput | Batch Size |
---|---|---|---|---|---|---|---|
2.8.0 | ResNet50 Keras LARS | 32 | bf16 | 22.33 min | 75.68 | 49441.8 img/sec | 256 |
2.8.0 | ResNet50 Keras LARS | 16 | bf16 | 40.68 min | 75.48 | 24779.8 img/sec | 256 |
2.8.0 | ResNet50 Keras LARS | 8 | bf16 | 69 min | 75.99 | 13018.31 img/sec | 256 |
2.8.0 | ResNet50 Keras LARS | 1 | bf16 | 516 min | 75.87 | 1636.7 img/sec | 256 |
2.8.0 | BERT-Large Pre Training combined | 32 | bf16 | 1591 min | 5485.87 sent/sec | 64 | |
2.8.0 | BERT-Large Pre Training combined | 8 | bf16 | 1378.49 sent/sec | 64 | ||
2.8.0 | BERT-Large Pre Training combined | 1 | bf16 | 173 sent/sec | 64 | ||
2.8.0 | BERT-Large Pre Training phase 1 | 32 | bf16 | 6621.8 sent/sec | 64 | ||
2.8.0 | BERT-Large Pre Training phase 1 | 8 | bf16 | 1666 sent/sec | 64 | ||
2.8.0 | BERT-Large Pre Training phase 1 | 1 | bf16 | 210.3 sent/sec | 64 | ||
2.8.0 | BERT-Large Pre Training phase 2 | 32 | bf16 | 2134.9 sent/sec | 8 | ||
2.8.0 | BERT-Large Pre Training phase 2 | 8 | bf16 | 534.5 sent/sec | 8 | ||
2.8.0 | BERT-Large Pre Training phase 2 | 1 | bf16 | 67.6 sent/sec | 8 | ||
2.8.0 | BERT-Large Fine Tuning (SQUAD) | 8 | bf16 | 17.41 min | 93.52 | 398.91 sent/sec | 24 |
2.8.0 | BERT-Large Fine Tuning (SQUAD) | 1 | bf16 | 69 min | 92.84 | 53.45 sent/sec | 24 |
2.8.0 | SSD | 8 | bf16 | 41.06 min | 23.26 | 3860.46 img/sec | 128 |
2.8.0 | SSD | 1 | bf16 | 35.13 min | 23.66 | 515.57 img/sec | 128 |
2.8.0 | Resnext-101 | 8 | bf16 | 382 min | 79 | 4451.32 img/sec | 128 |
2.8.0 | Resnext-101 | 1 | bf16 | 2718 min | 79.08 | 701.2 img/sec | 128 |
2.8.0 | UNet2D | 8 | bf16 | 4.05 min | 88.16 | 391.52 img/sec | 8 |
2.8.0 | UNet2D | 1 | bf16 | 17.67 min | 88.71 | 51.8 img/sec | 8 |
2.8.0 | UNet3D | 8 | bf16 | 14.25 min | 88.63 | 42.98 img/sec | 2 |
2.8.0 | UNet3D | 1 | bf16 | 81 min | 87.59 | 6.7 img/sec | 2 |
2.8.0 | Transformer | 8 | bf16 | 1054 min | 26.8 | 157051.57 sent/sec | 4.096 |
2.8.0 | Transformer | 1 | bf16 | 1012 min | 21.3 | 22327.68 sent/sec | 4.096 |
2.8.0 | MaskRCNN | 8 | bf16 | 215 min | 33.87 | 120.82 img/sec | 4 |
2.8.0 | MaskRCNN | 1 | bf16 | 1388 min | 33.89 | 17.84 img/sec | 4 |
2.8.0 | Vision Transformer | 8 | bf16 | 396 min | 84.54 | 530.32 img/sec | 32 |
2.8.0 | RetinaNet | 8 | bf16 | 461 min | 37.43 | 82.03 img/sec | 64 |
2.8.0 | Densenet 121 tf.distribute | 8 | bf16 | 381 min | 74.72 | 5300.07 img/sec | 1.024 |
2.8.0 | T5 Base | 1 | bf16 | 17.48 min | 93.67 | 100.38 img/sec | 16 |
2.8.0 | VGG SegNet | 1 | bf16 | 9 min | 89.4 | 108.05 img/sec | 16 |
2.8.0 | EfficientDet | 8 | fp32 | 87.25 min | 33.56 | 192.5 img/sec | 8 |
2.8.0 | CycleGAN | 1 | bf16 | 226 min | 15.62 img/sec | 2 | |
2.8.0 | Albert-Large Fine Tuning (SQUAD) | 1 | bf16 | 75 min | 90.9 | 55.08 sent/sec | 32 |
2.8.0 | Albert-Large Fine Tuning (SQUAD) | 8 | bf16 | 22.8 min | 91.01 | 438.79 sent/sec | 32 |
2.8.0 | ResNet50 Keras LARS tf.distribute | 8 | bf16 | 73 min | 76.07 | 12838.65 img/sec | 256 |
2.8.0 | ResNet50 Keras LARS Host NIC (HVD and Libfabric) | 16 | bf16 | 23084.74 img/sec | 256 | ||
2.8.0 | ResNet50 Keras LARS Host NIC (tf.distribute and Libfabric) | 16 | bf16 | 21865.04 img/sec | 256 | ||
2.8.0 | WideAndDeep | 1 | bf16 | 33 min | 65.42 | 722899.3 smpl/sec | 131.072 |
2.8.0 | Electra Fine Tuning | 1 | bf16 | 76 min | 92.13 | 118.84 img/sec | 16 |
2.8.0 | DistilBERT | 8 | bf16 | 2.6 min | 85.55 | 2403 sent/sec | 32 |
2.8.0 | Unet Industrial | 8 | bf16 | 2.01 min | 96.67 | 613.3 img/sec | 2 |
PyTorch Reference Models Performance
System Configuration:
HPU: Habana Gaudi® HL-205 Mezzanine cards
System: HLS-1 with eight HL-205 HPU and two Intel® Xeon® Platinum 8280 CPU @ 2.70GHz, and 756GB of System Memory
Software: Ubuntu20.04, SynapseAI Software version 1.3.0-499
Tensorflow: Models run with Tensorflow v2.8.0 use this Docker image; ones with v2.7.1 use this Docker image
PyTorch: Models run with PyTorch v1.10.1 use this Docker image
Environment: These workloads are run using the Docker images running directly on the Host OS
Performance varies by use, configuration and other factors. All information provided here is subject to change without notice. Habana Labs may make changes to its test conditions and internal reliability goals at any time. Contact your Habana Labs representative to obtain the latest Habana Labs product specifications and roadmaps. Your costs and results may vary.