Home » Get Started » Intel® Gaudi® AI accelerators Model Performance Data
Intel® Gaudi® AI accelerators Model Performance Data
These performance numbers are measured using the latest Intel Gaudi SW release, version 1.15.0-479, unless otherwise noted.
All Models for both Training and Inference are using the PyTorch 2.2.0 Framework. Other applicable frameworks used for training or inference are noted for each model.
These performance numbers have been generated with the latest version of SynapseAI and are improvements over the officially submitted numbers posted on MLCommons website.
Model
# HPU
Precision
Time To Train
Framework Version
MLPerf 3.1 - GPT3
384
fp8
153.58 min*
MLPerf 3.1 - GPT3
256
fp8
223.75 min**
MLPerf 3.1 - Stable Diffusion v2
64
bf16
19.4 min**
Lightning 2.1.2
MLPerf 3.1 - ResNet
8
bf16
16.4 min
MLPerf 3.1 - BERT
8
bf16
15.01 min
* The GPT3 measurement with 384 cards was taken using a pre-launch version of the SynapseAI 1.13.0 Software stack ** The GPT measurement with 256 cards and Stable Diffusion were taken using the SynapseAI 1.13.0 Software stack
Gaudi2 Large Language Models Training Performance
Model
# HPU
Precision
Throughput
Sequence Length
TP,PP,DP
Batch Size
Framework Version
LLaMA 13B
64
bf16
85.35 samples/sec
2,048
2,2,16
256
Megatron DeepSpeed PR #307
LLaMA 2 70B
256
bf16
33.6 samples/sec
4,096
8,8,4
1,024
Megatron DeepSpeed PR #307
LLaMA 2 70B*
512
bf16
55.4 samples/sec
4,096
8,8,8
2,048
Megatron DeepSpeed PR #307
LLaMA 2 70B*
1,024
bf16
104.4 samples/sec
4,096
8,8,16
4,096
Megatron DeepSpeed PR #307
TP, PP, DP = These are the Tensor Parallel, Pipeline Parallel and Data Parallel parameters for the Megatron DeepSpeed training
Gaudi2 Reference Models Training Performance
Model
# HPU
Precision
Throughput
Acc
TTT
Batch
Framework Version
Stable Diffusion
64
bf16
10923.41 img/sec
32
Lightning 2.2.0
Stable Diffusion Fine Tuning*
1
bf16
70 img/sec
7
Lightning 2.1.2
Stable Diffusion Fine Tuning Textual Inversion*
1
bf16
20.58 img/sec
7
Lightning 2.1.2
ResNet50 LARS
32
bf16
185633.08 img/sec
76.15
6.86 min
256
ResNet50 LARS
8
bf16
47270.53 img/sec
76.08
17.55 min
256
ResNet50 LARS
1
bf16
6059.24 img/sec
256
BERT Pre Training Phase 1
32
bf16
33238 sent/sec
Loss : 1.53
239 min
64
BERT Pre Training Phase 1
8
bf16
9251 sent/sec
842 min
64
BERT Pre Training Phase 1
1
bf16
1164 sent/sec
64
BERT Pre Training Phase 2
32
bf16
10718 sent/sec
Loss : 1.36
101 min
16
BERT Pre Training Phase 2
8
bf16
2784 sent/sec
309 min
16
BERT Pre Training Phase 2
1
bf16
348 sent/sec
16
BERT SQUAD Fine Tuning
8
bf16
2045.81 sent/sec
90.8
4.8 min
24
BERT SQUAD Fine Tuning
1
bf16
283.97 sent/sec
24
ResNext101
8
bf16
21806.83 img/sec
77.8
101.96 min
256
ResNext101
1
bf16
2849.82 img/sec
256
SSD
8
bf16
15934.83 img/sec
22.94
10.35 min
128
SSD
1
bf16
2095.02 img/sec
128
Transformer
8
bf16
1100836.66 token/sec
28.1
242.78 min
8,192
Transformer
1
bf16
136751.66 token/sec
8,192
Unet2D
8
bf16
20020.38 img/sec
72.57
9.96 min
64
Lightning 2.2.0
Unet2D
1
bf16
2668.74 img/sec
64
Lightning 2.2.0
Unet3D
8
bf16
249.09 img/sec
74.16
16.08 min
2
Lightning 2.2.0
Unet3D
1
bf16
30.43 img/sec
2
Lightning 2.2.0
DeepSpeed Chat LLaMA 7B Step1
8
bf16
74.42 sec/iter
ppl: 1.61
8
DeepSpeed 0.12.4
DeepSpeed Chat LLaMA 7B Step2
8
bf16
42.55 sec/iter
acc: 78.75
4
DeepSpeed 0.12.4
DeepSpeed Chat LLaMA 7B Step3
8
bf16
7.59 sec/iter
ema: 2.7
4
DeepSpeed 0.12.4
Hugging Face Optimum Habana Gaudi2 Training Performance
See the Examples page for information on how to run each of the Tasks, including model naming and hyperparameter usage.
Model
# HPU
Precision
Throughput
Accuracy
Time To Train
Batch Size
Task
Framework Version
Llama2-70B Fine Tuning FSDP (LoRA)
8
bf16
1.4 sentences/sec
2.13
81.75 min
10
language-modeling
Optimum Habana 1.10.4
Llama2-70B Fine Tuning (LoRA)
8
bf16
2.56 sentences/sec
2.12
41.35 min
10
language-modeling
DeepSpeed 0.12.4 Optimum Habana 1.10.4
Llama1-7B Fine Tuning (LoRA)
8
bf16
147.61 sentences/sec
2.34
5.18 min
64
language-modeling
Optimum Habana 1.10.4
Falcon-180B Fine Tuning (LoRA)
8
bf16
1.55 sentences/sec
3.71
254.63 min
1
language-modeling
DeepSpeed 0.12.4 Optimum Habana 1.10.4
Falcon-40B Fine Tuning (LoRA)
8
bf16
27.91 sentences/sec
4.06
16.46 min
1
language-modeling
Optimum Habana 1.10.4
GPTJ-CLM
8
bf16
16.49 sentences/sec
0.53
12.03 min
4
language-modeling
DeepSpeed 0.12.4 Optimum Habana 1.10.4
GPTNEOX-20B-CLM
8
bf16
297.85 sentences/sec
0.01
28.96 min
2
language-modeling
DeepSpeed 0.12.4 Optimum Habana 1.10.4
BridgeTower
8
bf16
488.41 sentences/sec
20.3 min
40
contrastive-image-text
Optimum Habana 1.10.4
GPT2
8
bf16
616.17 sentences/sec
4
language-modeling
DeepSpeed 0.12.4 Optimum Habana 1.10.4
GPT2-XL
8
bf16
89.69 sentences/sec
4
language-modeling
DeepSpeed 0.12.4 Optimum Habana 1.10.4
ALBERT-Large
8
bf16
2504.6 sentences/sec
92.16
1.91 min
32
question-answering
Optimum Habana 1.10.4
ALBERT-XXL
8
bf16
446.69 sentences/sec
94.91
6.98 min
12
question-answering
Optimum Habana 1.10.4
BERT Base
8
bf16
3208.66 sentences/sec
85.35
1.16 min
24
question-answering
Optimum Habana 1.10.4
BERT-Large Fine Tuning
8
bf16
2269 sentences/sec
93.40
1.98 min
24
question-answering
Optimum Habana 1.10.4
ClipRoBERTa
8
bf16
6734.02 images/sec
9.35 min
64
contrastive-image-text
Optimum Habana 1.10.4
DistilBERT
8
bf16
10890.78 sentences/sec
82.64
0.55 min
8
question-answering
Optimum Habana 1.10.4
Flan-T5 XXL
8
bf16
26.46 sentences/sec
37.27
384.96 min
22
question-answering
DeepSpeed 0.12.4 Optimum Habana 1.10.4
RoBERTa Base
8
bf16
6593.03 sentences/sec
92.03
0.78 min
12
question-answering
Optimum Habana 1.10.4
RoBERTa Large
8
bf16
2247.01 sentences/sec
94.62
1.98 min
12
question-answering
Optimum Habana 1.10.4
Swin Transformer
8
bf16
6166.55 images/sec
99.09
1.83 min
64
image-classification
Optimum Habana 1.10.4
T5-LARGE
8
bf16
89.43 sentences/sec
44.35
216.55 min
4
summarization
DeepSpeed 0.12.4 Optimum Habana 1.10.4
T5-Small
8
bf16
546.26 sentences/sec
26.18
104.96 min
4
translation
DeepSpeed 0.12.4 Optimum Habana 1.10.4
Vision Transformer
8
bf16
6618.09 images/sec
98.83
0.95 min
128
image-classification
Optimum Habana 1.10.4
Wav2Vec2.0 AC
8
bf16
2070.87 sentences/sec
81.74
2.41 min
16
speech-recognition
Optimum Habana 1.10.4
Wav2Vec2.0 ASR
8
bf16
79.14 sentences/sec
4.05
17.5 min
4
speech-recognition
Optimum Habana 1.10.4
MosaicML Gaudi2 Training Performance
Model
# HPU
Precision
Throughput
Accuracy
Time To Train
Batch Size
Framework Version
MosaicML MPT-1B
8
bf16
24816 samples/sec
6.98
13.2 min
512
PyTorch 2.2.0
MosaicML MPT-70B
32
bf16
14372 samples/sec
7.5
131.8 min
512
PyTorch 2.2.0
Gaudi Reference Models Training Performance
Model
# HPU
Precision
Throughput
Accuracy
Time To Train
Batch Size
Framework Version
ResNet50 Keras LARS
32
bf16
48288.55 img/sec
76.21
26 min
256
ResNet50 Keras LARS
8
bf16
12401.5 img/sec
76.51
69.25 min
256
ResNet50 Keras LARS
1
bf16
1621.65 img/sec
256
BERT Pre Training combine
32
bf16
4814.85 sent/sec
1803.26 min
64
BERT Pre Training combine
8
bf16
1234.88 sent/sec
64
BERT Pre Training combine
1
bf16
154.97 sent/sec
64
BERT Pre Training Phase 1
32
bf16
5763.84 sent/sec
Loss:
1348.41 min
64
BERT Pre Training Phase 1
8
bf16
1481.72 sent/sec
64
BERT Pre Training Phase 1
1
bf16
186.03 sent/sec
64
BERT Pre Training Phase 2
32
bf16
1920.95 sent/sec
Loss:
454.85 min
8
BERT Pre Training Phase 2
8
bf16
489.2 sent/sec
8
BERT Pre Training Phase 2
1
bf16
61.32 sent/sec
8
BERT SQUAD Fine Tuning
8
bf16
404.25 sent/sec
90.68
13.01 min
24
BERT SQUAD Fine Tuning
1
bf16
53.64 sent/sec
24
BART Fine Tuning
8
bf16
1368.88 sent/sec
32
DINO
8
bf16
944.37 exmpl/sec
77
2333.66 min
64
MobileNetV2
8
bf16
12203.16 img/sec
71.59
601.21 min
256
ResNet152
8
bf16
4964.63 img/sec
78.39
430.35 min
128
SSD**
8
bf16
3367.64 img/sec
128
Transformer
8
bf16
187914.33 tokens/sec
28
1027.21 min
4096
Unet2D
8
bf16
4961.65 img/sec
72.74
70.5 min
64
Lightning 2.2.0
Unet3D
8
bf16
60.14 img/sec
74.28
78.65 min
2
Lightning 2.2.0
YOLOX
8
bf16
307.83 img/sec
39.88
1855 min
16
ResNet50 Host NIC (libfabric)
16
bf16
22484.48 img/sec
256
Hugging Face Optimum Habana Gaudi Training Performance
See the Examples page for information on how to run each of the Tasks, including model naming and hyperparameter usage.
Model
# HPU
Precision
Throughput
Accuracy
Time To Train
Batch Size
Task
Framework Version
GPT2-XL
8
bf16
19.36 sentences/sec
0.47
76.43 min
4
language-modeling
DeepSpeed 0.12.4, Optimum Habana 1.10.4
GPT2
8
bf16
168.96 sentences/sec
0.41
4.16 min
4
language-modeling
DeepSpeed 0.12.4, Optimum Habana 1.10.4
T5-LARGE
8
bf16
50.16 sentences/sec
44.4
363.83 min
4
summarization
DeepSpeed 0.12.4, Optimum Habana 1.10.4
T5-Small
8
bf16
195.39 sentences/sec
26.16
118.11 min
4
translation
DeepSpeed 0.12.4, Optimum Habana 1.10.4
ALBERT-L
8
bf16
490.95 sentences/sec
92.76
7.88 min
32
question-answering
Optimum Habana 1.10.4
ALBERT-XXL
8
bf16
75.62 sentences/sec
94.81
41.53 min
12
question-answering
Optimum Habana 1.10.4
BERT-BASE
8
bf16
1205.98 sentences/sec
85.43
2.93 min
24
question-answering
Optimum Habana 1.10.4
BERT-Large FT
8
bf16
396.89 sentences/sec
93.09
8.95 min
24
question-answering
Optimum Habana 1.10.4
Clip-RoBERTa
8
bf16
849.02 images/sec
64
contrastive-image-text
Optimum Habana 1.10.4
DistilBERT
8
bf16
1579.32 sentences/sec
85.55
2.95 min
8
question-answering
Optimum Habana 1.10.4
RoBERTa Base
8
bf16
1068.01 sentences/sec
91.81
3.13 min
12
question-answering
Optimum Habana 1.10.4
RoBERTa Large
8
bf16
362.3 sentences/sec
94.76
9.21 min
12
question-answering
Optimum Habana 1.10.4
Swin Transformer
8
bf16
1589.16 images/sec
98.65
4.78 min
64
question-answering
Optimum Habana 1.10.4
Vision Transformer
8
bf16
2469.31 images/sec
97.16
2.81 min
64
question-answering
Optimum Habana 1.10.4
Wav2Vec2-AC
8
bf16
646.78 sentences/sec
80.67
6.48 min
16
speech-recognition
Optimum Habana 1.10.4
Wav2Vec2-ASR
8
bf16
38.85 sentences/sec
4.2
36.73 min
4
speech-recognition
Optimum Habana 1.10.4
Gaudi2 MLPerf™ 3.1 Inference Performance
Model
# HPU
Precision
Performance
Framework Version
MLPerf 3.1 - GPT-J Offline 99.9% Accuracy
8
fp8
84 samples/sec
PyTorch 2.2.0
MLPerf 3.1 - GPT-J Server 99.9% Accuracy
8
fp8
66 queries/sec
PyTorch 2.2.0
Gaudi2 Large Languages Models Inference Performance
Model
# HPU
Precision
Input length
Output Length
Max Token Sequence Length
Throughput
Latency***
Batch
Framework Version
Falcon-7B
1
bf16
100
8k
8k
116.9 token/sec
8.55 ms
1
Optimum Habana 1.10.4
Bloom-7B-Greedy
1
bf16
2k
721.41 token/sec
11.08 ms
8
Bloom-7B-Greedy
1
fp8
2K
192.20 token/sec
5.2 ms
1
GPT-J (Text Generation)
8
bf16
100
585.2 token/sec
6.83 ms
4
DeepSpeed 0.12.4, Optimum Habana 1.10.4
LLaMA 2-7B
1
fp8
2k
4k
6k
1604.44 token/sec
49.86 ms
12
Optimum Habana 1.10.4
LLaMA 2-7B
8
fp8
4k
8k
12k
5313.93 token/sec
29.35 ms
6
Optimum Habana 1.10.4
LLaMA 2-7B
8
fp8
8k
16k
24k
2648.5 token/sec
21.52 ms
3
Optimum Habana 1.10.4
LLaMA 2-7B
1
bf16
1k
3k
4k
411.31 token/sec
9.72 ms
4
Optimum Habana 1.10.4
Falcon-40B
8
bf16
100
2k
2k
84.33 token/sec
11.85 ms
1
Optimum Habana 1.10.4
LLaMA 2-70B
8
fp8
2k
4k
6k
5000.2 token/sec
55.39 ms
277
DeepSpeed 0.12.4, Optimum Habana 1.10.4
LLaMA 2-70B
8
fp8
2k
8k
10k
3171.51 token/sec
33.1 ms
77
DeepSpeed 0.12.4, Optimum Habana 1.10.4
LLaMA 2-70B
8
fp8
2k
16k
18k
1305.04 token/sec
25.28 ms
38
DeepSpeed 0.12.4, Optimum Habana 1.10.4
LLaMA 2-70B
8
fp8
2k
32k
34k
272.99 token/sec
21.97 ms
19
DeepSpeed 0.12.4, Optimum Habana 1.10.4
LLaMA 2-70B
8
bf16
2k
2k
4k
3348.89 token/sec
64.49 ms
216
DeepSpeed 0.12.4, Optimum Habana 1.10.4
LLaMA 2-70B
8
bf16
2k
6k
8k
1854.93 token/sec
16.17 ms
30
DeepSpeed 0.12.4, Optimum Habana 1.10.4
LLaMA 2-70B
8
bf16
2k
14k
16k
916.03 token/sec
16.37 ms
15
DeepSpeed 0.12.4, Optimum Habana 1.10.4
LLaMA 2-13B
1
bf16
2k
2k
4k
139.32 token/sec
14.35 ms
2
Optimum Habana 1.10.4
Bloomz-176B
8
bf16
100
36.76 token/sec
27.2 ms
1
DeepSpeed 0.12.4, Optimum Habana 1.10.4
Bloom-176B-Greedy
8
fp8
4K
201.09 token/sec
39.78 ms
8
DeepSpeed 0.12.4
Bloom-176B-Greedy
8
bf16
4K
398.04 token/sec
52.75 ms
21
DeepSpeed 0.12.4
Bloom-176B-Greedy
8
bf16
8K
197.78 token/sec
50.56 ms
10
DeepSpeed 0.12.4
Bloom-176B-Greedy
8
bf16
16K
84.42 token/sec
47.38 ms
4
DeepSpeed 0.12.4
Bloom-176B-Greedy
8
bf16
32K
26.04 token/sec
38.4 ms
1
DeepSpeed 0.12.4
Bloom-176B-Sampling
8
bf16
1k
19.6 token/sec
51.02 ms
1
DeepSpeed 0.12.4
Bloom-176B-BeamSearch-8
8
bf16
512
30.83 token/sec
32.43 ms
1
DeepSpeed 0.12.4
Falcon 180B
8
bf16
669.5 tokens/sec
59.74 ms
40
Optimum Habana 1.10.4
Gaudi2 Reference Models Inference Performance
Model
# HPU
Precision
Throughput
Latency***
Batch
Framework Version
Stable Diffusion v2.1 (512x512)
1
bf16
1.23 img/sec
813 ms
1
Lightning 2.2.0
Stable Diffusion v2.1 (768X768)
1
bf16
0.4 img/sec
2500 ms
1
Lightning 2.2.0
Bert FT
1
bf16
763.45 token/sec
31.23 ms
24
Resnet50
1
bf16
17163.9 img/sec
14.83 ms
256
Resnext101
1
bf16
10561.75 img/sec
24.19 ms
256
Unet2D
1
bf16
8789 img/sec
8.2 ms
64
Lightning 2.2.0
Unet3D
1
bf16
130 img/sec
15.38 ms
2
Lightning 2.2.0
Hugging Face Optimum Habana Gaudi2 Inference Performance
See the Examples page for information on how to run each of the Tasks, including model naming and hyperparameter usage.
Model
# HPU
Precision
Max Token Sequence Length
Throughput
Latency
Batch
Task
Framework Version
StableDiffusion v2.1 (512x512)
1
bf16
1.97 images/sec
2027.36 ms
4
stable-diffusion
PyTorch Lightning 2.2.0
OPT
1
bf16
1298.33 token/sec
0.77 ms
1
text-generation
DeepSpeed 0.12.4, Optimum Habana 1.10.4
StarCoder
1
bf16
65.44 token/sec
15.27 ms
1
text-generation
DeepSpeed 0.12.4, Optimum Habana 1.10.4
MPT-7B
1
bf16
1932
123.98 token/sec
8.06 ms
1
text-generation
Optimum Habana 1.10.4
Bert (Text Classification)
1
bf16
83.21 token/sec
48.06 ms
4
text-classification
Optimum Habana 1.10.4
Bert (Language Modeling)
1
bf16
578.92 token/sec
13.81 ms
8
language-modeling
Optimum Habana 1.10.4
Bert (Question Answering)
1
bf16
1950 token/sec
53.19 ms
8
question-answering
Optimum Habana 1.10.4
Bart
1
bf16
6.39 token/sec
312.89 ms
2
language-modeling
Optimum Habana 1.10.4
BridgeTower
1
bf16
329.37 token/sec
48.57 ms
16
constrastive-image-text
Optimum Habana 1.10.4
ESMFold
1
bf16
3.21 token/sec
311.33 ms
1
protein-folding
Optimum Habana 1.10.4
StableLM-3B
1
bf16
2048
242.65 token/sec
4.12 ms
1
text-generation
Optimum Habana 1.10.4
StableLM-7B
1
bf16
2048
126.93 token/sec
7.87 ms
1
text-generation
Optimum Habana 1.10.4
T5-3B Summarization 1024-128 Beam4
1
bf16
0.96 token/sec
1034.12 ms
1
summarization
Optimum Habana 1.10.4
T5-3B Summarization Greedy
1
bf16
2.45 token/sec
406.83 ms
1
summarization
Optimum Habana 1.10.4
HF-T5-Small-Translation-Greedy
1
bf16
30.86 token/sec
129.6 ms
4
translation
Optimum Habana 1.10.4
Wav2vec(Audio Classification)
1
bf16
991 token/sec
4.03 ms
4
audio-classification
Optimum Habana 1.10.4
Wav2vec(Speech Recoginition)
1
bf16
15.9 token/sec
251.57 ms
4
speech-recoginition
Optimum Habana 1.10.4
Gaudi Reference Models Inference Performance
Model
# HPU
Precision
Throughput
Latency
Batch Size
Framework Version
Bloom-176B-BeamSearch-8
16
bf16
10.63 token/sec
94.04 ms
1
DeepSpeed 0.12.4
Bloom-176B-Greedy
16
bf16
11.69 token/sec
85.5 ms
1
DeepSpeed 0.12.4
Bloom-176B-Sampling
16
bf16
7.87 token/sec
127 ms
1
DeepSpeed 0.12.4
Bloom-7B (512 token)
1
bf16
42.78 token/sec
23.37 ms
1
Stable Diffusion v2.1 (512x512)
1
bf16
0.36 img/sec
2739.72 ms
1
Lightning 2.2.0
Stable Diffusion v2.1 (768X768)
1
bf16
0.12 img/sec
7751.93 ms
1
Lightning 2.2.0
Bert
1
bf16
153.05 token/sec
156.8 ms
24
Unet2D
1
bf16
3465.17 img/sec
18.46 ms
64
Lightning 2.2.0
Unet3D
1
bf16
57.44 img/sec
34.81 ms
2
Lightning 2.2.0
Hugging Face Optimum Habana Gaudi Inference Performance
See the Examples page for information on how to run each of the Tasks, including model naming and hyperparameter usage.
Model
# HPU
Precision
Throughput
Latency
Batch
Task
Framework Version
BERT
1
bf16
36.89 token/sec
108.41 ms
4
language-modeling
Optimum Habana 1.10.4
BERT
1
bf16
127.74 token/sec
62.62 ms
8
question-answering
Optimum Habana 1.10.4
BERT
1
bf16
433.91 token/sec
18.43 ms
8
text-classification
Optimum Habana 1.10.4
BART-Greedy
1
bf16
3.03 token/sec
658.97 ms
2
summarization
Optimum Habana 1.10.4
ESMFold
1
bf16
14.14 token/sec
70.69 ms
1
protein-folding
Optimum Habana 1.10.4
Stable Diffusion v2.1 image size 512x512
1
bf16
0.53 token/sec
7434.94 ms
4
text to image generation
Optimum Habana 1.10.4
T5-Small Translation Greedy
1
bf16
17.03 token/sec
234.86 ms
4
translation
Optimum Habana 1.10.4
Wav2Vec 2.0 ASR
1
bf16
501.5 token/sec
7.97 ms
4
speech-recognition
Optimum Habana 1.10.4
Wav2Vec 2.0 Speech Classification
1
bf16
9.63 token/sec
415.06 ms
4
speech-recognition
Optimum Habana 1.10.4
* These models used the previous 1.14.0 software release *** For the Large Language Inference Models, this is the average next token latency
System Configuration:
Gaudi® Platform System: HLS-1 with eight Habana Gaudi HL-205 Mezzanine cards and two Intel® Xeon® Platinum 8280 CPU @ 2.70GHz, and 756GB of System Memory
Gaudi®2 Platform System: HLS-Gaudi2 with eight Habana Gaudi2 HL-225H Mezzanine cards and two Intel® Xeon® Platinum 8380 CPU @ 2.30GHz, and 1TB of System Memory
Common Software Ubuntu22.04, SynapseAI Software version 1.15.0-479 PyTorch: Models run with PyTorch v2.2.0 use this Docker image Environment: These workloads are run using the Docker images running directly on the Host OS
Performance varies by use, configuration and other factors. Please refer to the Model-References GitHub page for each model’s support and validation coverage. All information provided here is subject to change without notice. Habana Labs may make changes to its test conditions and internal reliability goals at any time. Contact your Habana Labs representative to obtain the latest Habana Labs product specifications and roadmaps. Your costs and results may vary.
Stay Informed: Register for the latest Intel Gaudi AI Accelerator developer news, events, training, and updates.