Model#HPUPrecisionTime To TrainFrameworks Version
MLPerf 3.1 - GPT3384fp8153.58 min** 
MLPerf 3.1 - GPT3256fp8223.75 min† 
MLPerf 3.1 - Stable Diffusion v264bf1619.4 min†Lightning 2.1.2
MLPerf 3.1 - ResNet8bf1616.4 min‡ 
MLPerf 3.1 - BERT8bf1615.01 min‡ 

Model#HPUPrecisionThroughputSequence LengthTP,PP,DPBatch SizeFramework Version
LLaMA 2 7B8FP870688 tokens/sec4,0961,1,81,024Megatron DeepSpeed PR #374
LLaMA 2 13B16FP855296 tokens/sec4,0962,2,4256Megatron DeepSpeed PR #374
LLaMA 2 70B64FP854067 tokens/sec4,0968,2,41,024Megatron DeepSpeed PR #374
LLaMA 2 70B**256bf16137625 tokens/sec4,0968,8,41,024Megatron DeepSpeed PR #307
LLaMA 2 70B**512bf16226918 tokens/sec4,0968,8,82048Megatron DeepSpeed PR #307
LLaMA 2 70B**1024bf16427622 tokens/sec4,0968,8,164096Megatron DeepSpeed PR #307

 

Model#HPUPrecisionThroughputAccuracyTime To TrainBatchFramework Version
Llama 2 13B16bf1610.16 samples/sec  256DeepSpeed 0.14.0
Llama 2 70B64bf169.13 samples/sec  1024DeepSpeed 0.14.0
Llama 2 70B64FP813.17 samples/sec  1024DeepSpeed 0.14.0
MIXTRAL-8x7B-32K32bf160.7 samples/sec88.46345 min128DeepSpeed 0.14.0
Stable Diffusion64bf1611122 img/sec  32Lightning 2.3.3
Stable Diffusion Fine Tuning**1bf1673 img/sec  7Lightning 2.3.3
Stable Diffusion Fine Tuning Textual Inversion**1bf1619.7 img/sec  7Lightning 2.3.3
ResNet50 LARS32bf1618399 img/sec76.387.26 min256 
ResNet50 LARS8bf1648166.02 img/sec76.0417.81 min256 
ResNet50 LARS1bf166201.14 img/sec  256 
BERT Pre Training Phase 1 (torch.compile)32bf1633179.52 sent/sec 238 min64 
BERT Pre Training Phase 1 (torch.compile)8bf168593.03 sent/sec0 64 
BERT Pre Training Phase 1 (torch.compile)1bf161074.45 sent/sec  64 
BERT Pre Training Phase 2 (torch.compile)32bf169861.81 sent/sec087 min16 
BERT Pre Training Phase 2 (torch.compile)8bf162568.65 sent/sec0 16 
BERT Pre Training Phase 2 (torch.compile)1bf16320.41 sent/sec  16 
BERT SQUAD Fine Tuning8bf162013 sent/sec90.524.68 min24 
ResNext1018bf1621851 img/sec77.81102 min256 
Transformer8bf161121879 token/sec27.9236 min8,192 
Unet2D (torch.compile)8bf1619888 img/sec72.510.21 min64Lightning 2.3.3
Unet3D PTL8bf16252 img/sec74.1717.96 min2Lightning 2.3.3

 

Model#HPUPrecisionThroughputAccTTTBatchTaskFramework Version
Llama2-70B Fine Tuning FSDP (LoRA with torch.compile)8bf161.81 sentences/sec2.1360 min10language-modelingOptimum Habana 1.12.1
Llama2-70B Fine Tuning (LoRA)8bf162.66 sentences/sec2.1338.86 min10language-modelingDeepSpeed 0.14.0 Optimum Habana 1.12.1
Falcon-180B Fine Tuning (LoRA)8bf162.47 sentences/sec3.74162.13 min1language-modelingDeepSpeed 0.14.0 Optimum Habana 1.12.1
GPTJ-CLM8bf1622.17 sentences/sec0.5321.56 min4language-modelingDeepSpeed 0.14.0 Optimum Habana 1.12.1
GPTNEOX-20B-CLM16bf16257 sentences/sec0.5341 min2language-modelingDeepSpeed 0.14.0 Optimum Habana 1.12.1
BridgeTower8bf161031 sentences/sec 7.28 min40contrastive-image-textOptimum Habana 1.12.1
GPT2-XL8bf1695.69 sentences/sec0.478.81 min4language-modelingDeepSpeed 0.14.0 Optimum Habana 1.12.1
ALBERT-XXL8bf16422 sentences/sec94.87.4 min16question-answeringOptimum Habana 1.12.1
BERT Base (torch.compile)8bf164513 sentences/sec85.290.93 min24question-answeringOptimum Habana 1.12.1
BERT-Large Fine Tuning (torch.compile)8bf162099 sentences/sec93.181.93 min32question-answeringOptimum Habana 1.12.1
ClipRoBERTa (torch.compile)8bf166420 images/sec 8.95 min64contrastive-image-textOptimum Habana 1.12.1
DistilBERT (torch.compile)8bf1612192 sentences/sec82.020.56 min64question-answeringOptimum Habana 1.12.1
Flan-T5 XXL8bf1627.11 sentences/sec37.06356 min22question-answeringDeepSpeed 0.14.0 Optimum Habana 1.12.1
RoBERTa Large (torch.compile)8bf162084 sentences/sec94.841.95 min32question-answeringOptimum Habana 1.12.1
Swin Transformer8bf165830 images/sec99.091.8 min160question-answeringOptimum Habana 1.12.1
T5-LARGE8bf1686 sentences/sec44.34226 min4image-classificationDeepSpeed 0.14.0 Optimum Habana 1.12.1
Vision Transformer8bf166273 images/sec98.850.91 min128image-classificationOptimum Habana 1.12.1
Wav2Vec2.0 AC8bf161933 sentences/sec81.472.46 min16speech-recognitionOptimum Habana 1.12.1
Wav2Vec2.0 ASR8bf1688 sentences/sec3.9617.5 min4speech-recognitionOptimum Habana 1.12.1


 

Model#HPUPrecisionThroughputAccuracyTime To TrainBatch SizeFramework Version
MosaicML MPT-1B8bf1623542 samples/sec6.9513.83 min512DeepSpeed 0.14.0
MosaicML MPT-70B32bf1617955 samples/sec7.47106 min512DeepSpeed 0.14.0


 

 

​​
Model#HPUPrecisionThroughputAccuracyTime To TrainBatch SizeFramework Version
ResNet50 LARS​ (torch.compile)32bf1646508 img/sec76.3923.4 min256 
ResNet50 LARS​ (torch.compile)​8bf1611959 img/sec76.392.7 min256 
BERT Pre Training combine32bf164851 sent/sec 1735 min64 
BERT Pre Training combine8bf161240 sent/sec  64 
BERT Pre Training Phase 132bf165810 sent/secLoss:1302 min64 
BERT Pre Training Phase 18bf161489 sent/sec  64 
BERT Pre Training Phase 232bf161932 sent/secLoss:433 min16 
BERT Pre Training Phase 28bf16490 sent/sec  16 
BERT SQUAD Fine Tuning8bf16406 sent/sec90.6812.96 min24 
BART Fine Tuning8bf161782 sent/sec  32 
Transformer8bf16186020 tokens/sec27.81034 min4096 
Unet2D (torch.compile)8bf164776 img/sec72.8867.4 min64Lightning 2.3.3
Unet3D PTL8bf1660.77 img/sec74.2859.4 min2Lightning 2.3.3
YOLOX8bf16312.37 img/sec39.932331.2 min16 


 

 

​​​​​​​​​​
Model#HPUPrecisionThroughputAccuracyTime To TrainBatch SizeTaskFramework Version
GPT2-XL8bf1619.49 sentences/sec0.47101 min4language-modelingDeepSpeed 0.14.0, Optimum Habana 1.12.1
T5-LARGE​8bf1649.67 sentences/sec44.34368 min4summarizationDeepSpeed 0.14.0, Optimum Habana 1.12.1
ALBERT-XXL​8bf1671.2 sentences/sec94.8843.6 min12question-answeringOptimum Habana 1.12.1
BERT-BASE FT (torch.compile)8bf161187 sentences/sec85.533.1 min24question-answeringOptimum Habana 1.12.1
BERT-Large FT (torch.compile)​8bf16415 sentences/sec93.368.6 min24question-answeringOptimum Habana 1.12.1
Clip-RoBERTa​8bf16630 images/sec 46 min64contrastive-image-textOptimum Habana 1.12.1
RoBERTa Base (torch.compile)​8bf161141 sentences/sec91.773.2 min12question-answeringOptimum Habana 1.12.1
RoBERTa Large (torch.compile)​8bf16412 sentences/sec94.588.6 min12question-answeringOptimum Habana 1.12.1
Swin Transformer8bf161592 images/sec98.684.6 min64question-answeringOptimum Habana 1.12.1
Vision Transformer8bf162461 images/sec97.192.81 min64question-answeringOptimum Habana 1.12.1