Habana Training Models and Performance-0.15.4

See the latest TensorFlow and PyTorch model performance data. Visit the Habana catalog for information on models and containers that are currently integrated with Habana’s Synapse AI software suite. For more information on future model support, please refer to our SynapseAI roadmap page.

TensorFlow Reference Models Performance

Framework	Model	# HPU	Time to Train	Accuracy	Throughput	Batch Size
TensorFlow 2.5.0	ResNet50 Keras LARS	1	8h45m	75.91	1690 images/sec	256
TensorFlow 2.5.0	ResNet50 Keras LARS	8	1h15m	75.93	12900 images/sec	256
TensorFlow 2.4.1	ResNet50 Keras LARS	16	N/A	75.94	23231 images/sec	256
TensorFlow 2.4.1	ResNet50 Keras LARS	32	N/A	75.14	46000 images/sec	256
TensorFlow 2.5.0	ResNet50 Keras SGD	1	19h-20m	76.06	1690 images/sec	256
TensorFlow 2.5.0	ResNet50 Keras SGD	8	3h	76.18	12400 images/sec	256
TensorFlow 2.4.1	ResNet50 Keras LARS (tfdistribute)	8		75.99	10600 images/sec	256
TensorFlow 2.5.0	BERT-Large Fine Tuning (SQUAD)	1	1h20	92.95	54 sentences/sec	24
TensorFlow 2.5.0	BERT-Large Fine Tuning (SQUAD)	8	20m	93.2	300 sentences/sec	24
TensorFlow 2.5.0*	BERT-Large Pre Training	1	N/A	N/A	Phase 1 166 sentences/sec Phase 2 30 sentences/sec	Phase 1 64 Phase 2 8
TensorFlow 2.5.0*	BERT-Large Pre Training	8	N/A		Phase 1 1310sentences/sec Phase 2 246 sentences/sec	Phase 1 64 Phase 2 8
TensorFlow 2.5.0*	BERT-Large Pre Training	32	39h		Phase 1 5152 sentences/sec Phase 2 980 sentences/sec	Phase 1 64 Phase 2 8
TensorFlow 2.4.1	Mask R-CNN	1	36h	34.14	12 images/sec	4
TensorFlow 2.4.1	Mask R-CNN	8	7h	34.17	76 images/sec	4
TensorFlow 2.5.0	Unet2D	1	1h50	88.74	49 images/sec	8
TensorFlow 2.5.0	Unet2D	8	41m	88.4	373 images/sec	8
TensorFlow 2.5.0	ResNext101	1	48h	79.19	650 images/sec	128
TensorFlow 2.5.0	ResNext101	8	6h45m	79.19	4510 images/sec	128
TensorFlow 2.5.0	SSD ResNet34	1	3h55m	22.98	475 images/sec	128
TensorFlow 2.5.0	SSD ResNet34	8	45m	22.24	3455 images/sec	128
TensorFlow 2.5.0**	Transformer	1	17h	21.4	18760	4096
TensorFlow 2.5.0**	Transformer	8	22h30m	26	138550	4096
TensorFlow 2.4.1	DenseNet	1	N/A	0.712	836 images/sec	128
TensorFlow 2.5.0	ALBERT-Large Fine Tuning (SQUAD)	1			45 sentences/sec	32
TensorFlow 2.5.0	ALBERT-Large Fine Tuning (SQUAD)	8		F1 90.9 EM 84.1	358 sentences/sec	32
TensorFlow 2.5.0	ALBERT-Large Pre Training	1			142 sentences/sec	64

* With accumulation steps
** Evaluation graph in Transformer is run on CPU and may impact TTT performance.

PyTorch Reference Models Performance

Framework	Model	# HPU	Time to Train	Accuracy	Throughput	Batch Size
PyTorch 1.7.1	ResNext101	1	N/A	77.95	730 images/sec	128
PyTorch 1.7.1	ResNext101	8	15h 1min	78.3	2860 images/sec	128
PyTorch 1.7.1	ResNet50	1	N/A	76.08	1330 images/sec	256
PyTorch 1.7.1*	ResNet50	8	9h 30 min	76.22	5700 images/sec	256
PyTorch 1.7.1*	ResNet50	16	6h	76.13	6657 images/sec	256
PyTorch 1.7.1	BERT-Large Fine Tuning (SQUAD) Lazy Mode	1	1h 12min	93.09	45 sentences/sec	24
PyTorch 1.7.1	BERT-Large Fine Tuning (SQUAD) Lazy Mode	8	20 min	93.05	303 sentences/sec	24
PyTorch 1.7.1	BERT-Large Pre Training Lazy Mode	1	N/A		Phase 1 123 sentences/sec Phase 2 23 sentences/sec	64
PyTorch 1.7.1	BERT-Large Pre Training Lazy Mode	8	N/A		Phase 1 950 sentences/sec Phase 2 176 sentences/sec	64
PyTorch 1.7.1	BERT-Large Pre Training Graph Mode	1	N/A		Phase 1 128sentences/sec Phase 2 24 sentences/sec	64
PyTorch 1.7.1	BERT-Large Pre Training Graph Mode	8	N/A		Phase 1 1016 sentences/sec Phase 2 190 sentences/sec	64

* PyTorch dataloader consumes a significant portion of the training time, improving overall model performances.

System Configuration:
HPU: Habana Gaudi® HL-205 Mezzanine cards
System: HLS-1 with eight HL-205 HPU and two Intel® Xeon® Platinum 8280 CPU @ 2.70GHz, and 756GB of System Memory
Software: Ubuntu20.04, SynapseAI Software version 1.0.0-532
Tensorflow: Models run with Tensorflow v2.5.0 use this Docker image;
PyTorch: Models run with PyTorch v1.8.1 use this Docker image
Environment: These workloads are run using the Docker images running directly on the Host OS

Performance varies by use, configuration and other factors. All information provided here is subject to change without notice. Habana Labs may make changes to its test conditions and internal reliability goals at any time. Contact your Habana Labs representative to obtain the latest Habana Labs product specifications and roadmaps. Your costs and results may vary.