Framework VersionModelUsagePrecisionThroughputPerf/WattLatency(ms)Batch sizeConfig*
Intel PyTorch 2.1 DeepSpeedGPT-J 6B Token size 1024/128text-generation, Beam Search, Width=4int8  4011 instance per socket
Intel PyTorch 2.1 DeepSpeedGPT-J 6B Token size 1024/128text-generation, Beam Search, Width=4int8130.4 tokens/s 9261 instance per socket
Intel PyTorch 2.1 DeepSpeedGPT-J 6B Token size 1024/128text-generation, Beam Search, Width=4bf16  59.511 instance per socket
Intel PyTorch 2.1 DeepSpeedGPT-J 6B Token size 1024/128text-generation, Beam Search, Width=4bf16125 tokens/s 9661 instance per socket
MLPerf Inference v3.1GPT-J (offline, 99.0% acc)Large Language Modelint82.05 samp/s  74 cores per instance
Intel PyTorch 2.1 DeepSpeedLLaMA2-7B Token size 1024/128text-generation, Beam Search, Width=4int8  4711 instance per socket
Intel PyTorch 2.1 DeepSpeedLLaMA2-7B Token size 1024/128text-generation, Beam Search, Width=4int8111.6 tokens/s 107.561 instance per socket
Intel PyTorch 2.1 DeepSpeedLLaMA2-7B Token size 1024/128text-generation, Beam Search, Width=4bf16  6811 instance per socket
Intel PyTorch 2.1 DeepSpeedLLaMA2-7B Token size 1024/128text-generation, Beam Search, Width=4bf16109.1 tokens/s 11061 instance per socket
MLPerf Inference v3.1ResNet50 v1.5 (offline)Image Recognitionint820,565.5 samp/s  2561 core per instance
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionint810,215.7 img/s9.98 14 cores per instance
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionint813,862.96 img/s14.09 1161 instance per socket
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionbf166,210.69 img/s6.13 14 cores per instance
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionbf167,295.63 img/s7.33 1161 instance per socket
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionfp321,319.52 img/s1.27 14 cores per instance
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionfp321,360.05 img/s1.28 1161 instance per socket
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionbf321,659.37 img/s1.65 14 cores per instance
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionbf321,985.26 img/s2.02 1161 instance per socket
Intel TensorFlow 2.14ResNet50 v1.5Image Recognitionint87,440.61 img/s7.70 14 cores per instance
Intel TensorFlow 2.14ResNet50 v1.5Image Recognitionint812,345.54 img/s11.80 1161 instance per socket
Intel TensorFlow 2.14ResNet50 v1.5Image Recognitionbf165,053.76 img/s5.01 14 cores per instance
Intel TensorFlow 2.14ResNet50 v1.5Image Recognitionbf166,704.17 img/s6.34 1161 instance per socket
Intel TensorFlow 2.14ResNet50 v1.5Image Recognitionfp321,282.77 img/s1.17 14 cores per instance
Intel TensorFlow 2.14ResNet50 v1.5Image Recognitionfp321,342.91 img/s1.27 1161 instance per socket
Intel TensorFlow 2.14ResNet50 v1.5Image Recognitionbf321,529.49 img/s1.41 14 cores per instance
Intel TensorFlow 2.14ResNet50 v1.5Image Recognitionbf322,017.54 img/s1.89 1161 instance per socket
OpenVINO 2023.2ResNet50 v1.5Image Recognitionint88,819.657 img/s8.81 14 cores per instance
OpenVINO 2023.2ResNet50 v1.5Image Recognitionbf165,915.793 img/s5.82 14 cores per instance
OpenVINO 2023.2ResNet50 v1.5Image Recognitionfp321,281.337 img/s1.25 14 cores per instance
MLPerf Inference v3.1BERT-Large (offline, 99.0% acc)Natural Language Processingint81,357.33 samp/s  1,3004 cores per instance
Intel PyTorch 2.1BERTLargeNatural Language Processingint8335.1 sent/s0.35 14 cores per instance
Intel PyTorch 2.1BERTLargeNatural Language Processingint8378.73 sent/s0.36 561 instance per socket
Intel PyTorch 2.1BERTLargeNatural Language Processingbf16204.52 sent/s0.21 14 cores per instance
Intel PyTorch 2.1BERTLargeNatural Language Processingbf16201.44 sent/s0.21 161 instance per socket
Intel PyTorch 2.1BERTLargeNatural Language Processingfp3235.25 sent/s0.03 14 cores per instance
Intel PyTorch 2.1BERTLargeNatural Language Processingfp3241.05 sent/s0.04 561 instance per socket
Intel PyTorch 2.1BERTLargeNatural Language Processingbf3272.42 sent/s0.07 14 cores per instance
Intel PyTorch 2.1BERTLargeNatural Language Processingbf3271.63 sent/s0.07 161 instance per socket
Intel TensorFlow 2.14BERTLargeNatural Language Processingint8253.27 sent/s0.24 14 cores per instance
Intel TensorFlow 2.14BERTLargeNatural Language Processingint8239.89 sent/s0.25 161 instance per socket
Intel TensorFlow 2.14BERTLargeNatural Language Processingbf16181.02 sent/s0.18 14 cores per instance
Intel TensorFlow 2.14BERTLargeNatural Language Processingbf16184.06 sent/s0.17 1281 instance per socket
Intel TensorFlow 2.14BERTLargeNatural Language Processingfp3244.73 sent/s0.04 14 cores per instance
Intel TensorFlow 2.14BERTLargeNatural Language Processingfp3238.58 sent/s0.04 161 instance per socket
Intel TensorFlow 2.14BERTLargeNatural Language Processingbf3272.78 sent/s0.07 14 cores per instance
Intel TensorFlow 2.14BERTLargeNatural Language Processingbf3271.77 sent/s0.07 161 instance per socket
OpenVINO 2023.2BERTLargeNatural Language Processingint8298.44 sent/s0.30 14 cores per instance
OpenVINO 2023.2BERTLargeNatural Language Processingint8285.68 sent/s0.28 481 instance per socket
OpenVINO 2023.2BERTLargeNatural Language Processingbf16202.48 sent/s0.20 14 cores per instance
OpenVINO 2023.2BERTLargeNatural Language Processingbf16191.2533 sent/s0.19 321 instance per socket
OpenVINO 2023.2BERTLargeNatural Language Processingfp3247.33667 sent/s0.05 14 cores per instance
OpenVINO 2023.2BERTLargeNatural Language Processingfp3244.23333 sent/s0.04 481 instance per socket
MLPerf Inference v3.1DLRM-v2 (offline, 99.0% acc)Recommenderint85,367.77 samp/s  3001 core per instance
Intel PyTorch 2.1DLRM Criteo TerabyteRecommenderint823,444,587 rec/s23611.92 1281 instance per socket
Intel PyTorch 2.1DLRM Criteo TerabyteRecommenderbf1610,646,560 rec/s10238.88 1281 instance per socket
Intel PyTorch 2.1DLRM Criteo TerabyteRecommenderfp322,278,228 rec/s2220.37 1281 instance per socket
Intel PyTorch 2.1DLRM Criteo TerabyteRecommenderbf324,530,200 rec/s4427.38 1281 instance per socket
Intel PyTorch 2.1DistilBERTNatural Language Processingint84,726.15 sent/s4.94 14 cores per instance
Intel PyTorch 2.1DistilBERTNatural Language Processingint87,759.25 sent/s8.42 1681 instance per socket
Intel PyTorch 2.1DistilBERTNatural Language Processingbf163,306.46 sent/s3.35 14 cores per instance
Intel PyTorch 2.1DistilBERTNatural Language Processingbf165,057.47 sent/s5.50 1201 instance per socket
Intel PyTorch 2.1DistilBERTNatural Language Processingfp32900.58 sent/s0.85 14 cores per instance
Intel PyTorch 2.1DistilBERTNatural Language Processingfp321,007.05 sent/s1.04 561 instance per socket
Intel PyTorch 2.1DistilBERTNatural Language Processingbf321,513.66 sent/s1.49 14 cores per instance
Intel PyTorch 2.1DistilBERTNatural Language Processingbf321,926.1 sent/s1.77 2881 instance per socket
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationint861.03 sent/s0.06 14 cores per instance
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationint8245.66 sent/s0.24 4481 instance per socket
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationbf1641.44 sent/s0.04 14 cores per instance
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationbf16278.81 sent/s0.28 4481 instance per socket
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationfp3220.27 sent/s0.02 14 cores per instance
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationfp32102.48 sent/s0.10 4481 instance per socket
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationbf3220.28 sent/s0.02 14 cores per instance
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationbf32114.08 sent/s0.11 4481 instance per socket
OpenVINO 2023.23D-UnetImage Segmentationint824.68333 samp/s0.02 14 cores per instance
OpenVINO 2023.23D-UnetImage Segmentationint821.85667 samp/s0.02 61 instance per socket
OpenVINO 2023.23D-UnetImage Segmentationbf1613.05333 samp/s0.01 14 cores per instance
OpenVINO 2023.23D-UnetImage Segmentationbf1611.87 samp/s0.01 61 instance per socket
OpenVINO 2023.23D-UnetImage Segmentationfp322.883333 samp/s0.00 14 cores per instance
OpenVINO 2023.23D-UnetImage Segmentationfp322.62 samp/s0.00 61 instance per socket
OpenVINO 2023.2SSD-ResNet34 COCO 2017 (1200 x1200)Object Detectionint8459.3633 img/s0.44 14 cores per instance
OpenVINO 2023.2SSD-ResNet34 COCO 2017 (1200 x1200)Object Detectionbf16218.4133 img/s0.20 14 cores per instance
OpenVINO 2023.2SSD-ResNet34 COCO 2017 (1200 x1200)Object Detectionfp3231.17333 img/s0.03 14 cores per instance
Intel PyTorch 2.1ResNeXt101 32x16d ImageNetImage Classificationint81289.95 fps1.35 14 cores per instance
Intel PyTorch 2.1ResNeXt101 32x16d ImageNetImage Classificationint81923.77 fps1.83 1161 instance per socket
Intel PyTorch 2.1ResNeXt101 32x16d ImageNetImage Classificationbf16648.58 fps0.66 14 cores per instance
Intel PyTorch 2.1ResNeXt101 32x16d ImageNetImage Classificationbf16867.05 fps0.87 641 instance per socket
Intel PyTorch 2.1ResNeXt101 32x16d ImageNetImage Classificationfp32151.29 fps0.14 14 cores per instance
Intel PyTorch 2.1ResNeXt101 32x16d ImageNetImage Classificationfp32160.93 fps0.15 641 instance per socket
Intel PyTorch 2.1ResNeXt101 32x16d ImageNetImage Classificationbf32215.11 fps0.21 14 cores per instance
Intel PyTorch 2.1ResNeXt101 32x16d ImageNetImage Classificationbf32241.98 fps0.22 1161 instance per socket
MLPerf Inference v3.1RetinaNet (offline)Object Detectionint8284.75 samp/s  24 cores per instance
MLPerf Inference v3.1RNN-T (offline)Speech-to-textint8+bf165,782.18 samp/s  2564 cores per instance

 

Framework VersionModel/DatasetUsagePrecisionSockets / NodesTotal Time to Train (min)ThroughputBatch Size
MLPerf Training v3.1BERT-Large (seq len=512) / Wikipedia 2020/01/01Natural Language Processingbf1632 Sockets / 16 Nodes47.93 3,072
MLPerf Training v3.1DLRM-v2 / Criteo 4TB multi-hotRecommendationbf168 Sockets / 4 Nodes227.14 65,536
MLPerf Training v3.1ResNet-50 / ImageNetImage Classificationbf1632 Sockets / 16 Nodes88.568,584.5 img/s3,264
MLPerf Training v3.1RetinaNet / Open ImagesObject Detectionbf1632 Sockets / 16 Nodes232.4351.2 img/s256

 

Framework VersionModelUsagePrecisionTTT (minutes)AccurayBatch SizeRanks
Transformers 4.31, Intel Extension for Pytorch 2.0.1, PEFT 0.4.0GPT-J 6B (Glue MNLI dataset)Fine-turning, Text generation taskbf16230.4081.681
Transformers 4.34.1, Intel PyTorch 2.1.0, PEFT 0.5.0, Intel(r) oneCCL v2.1.0BioGPT (1.5 billion parameters) (PubMedQA dataset)Fine-turning, Response generationbf1648.7079.488
Intel(r) Tensorflow 2.14, horovod 0.28, Open MPI 4.1.2, Python 3.10.0ResNet50 v1.50 (Colorectal histology dataset)Fine-turning, Colorectal cancer detectionfp328.8394.33264
Intel(r) Tensorflow 2.14, horovod 0.28, Open MPI 4.1.2, Python 3.10.0ResNet50 v1.50 (Colorectal histology dataset)Fine-turning, Colorectal cancer detectionbf164.6594.33264
Intel(r) Tensorflow 2.14, horovod 0.28, Open MPI 4.1.2, Python 3.10.0ResNet50 v1.50 (Colorectal histology dataset)Fine-turning, Colorectal cancer detectionfp326.0493.832128
Intel(r) Tensorflow 2.14, horovod 0.28, Open MPI 4.1.2, Python 3.10.0ResNet50 v1.50 (Colorectal histology dataset)Fine-turning, Colorectal cancer detectionbf164.0294.632128
Transformers 4.35.0, Intel PyTorch 2.0.100, Intel® oneCCL 2.0.100BERTLarge Uncased (IMDb dataset)Fine-turning, Sentiment Analysisfp3261.7293.59644
Transformers 4.35.0, Intel PyTorch 2.0.100, Intel® oneCCL 2.0.100BERTLarge Uncased (IMDb dataset)Fine-turning, Sentiment Analysisbf1618.8693.88644
Transformers 4.35.0, Intel PyTorch 2.0.100, Intel® oneCCL 2.0.100BERTLarge Uncased (GLUE SST2 dataset)Fine-turning, Sentiment Analysisfp3214.0692.22564
Transformers 4.35.0, Intel PyTorch 2.0.100, Intel® oneCCL 2.0.100BERTLarge Uncased (GLUE SST2 dataset)Fine-turning, Sentiment Analysisbf163.6892.092564

 

Framework VersionModel/DatasetUsagePrecisionThroughputPerf/WattBatch size
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionfp32129.97 img/s0.161257103128
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionbf16327.96 img/s0.420294498128
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionbf32146.18 img/s0.180068983128
Intel TensorFlow 2.14ResNet50 v1.5 ImageNet (224 x224)Image Recognitionfp32137.36 img/s0.1631063351024
Intel TensorFlow 2.14ResNet50 v1.5 ImageNet (224 x224)Image Recognitionbf16317.83 img/s0.3774792751024
Intel TensorFlow 2.14ResNet50 v1.5 ImageNet (224 x224)Image Recognitionbf32152 img/s0.1808060141024
Intel PyTorch 2.1DLRM Criteo TerabyteRecommenderfp32265,503.91 rec/s323.990713632768
Intel PyTorch 2.1DLRM Criteo TerabyteRecommenderbf16783,058.09 rec/s980.3666932768
Intel PyTorch 2.1DLRM Criteo TerabyteRecommenderbf32369,848.15 rec/s447.844800432768
Intel TensorFlow 2.14SSD-ResNet34 COCO 2017 (1200 x1200)Object Detectionfp3252.49 img/s0.069833963896
Intel TensorFlow 2.14SSD-ResNet34 COCO 2017 (1200 x1200)Object Detectionbf16190.53 img/s0.251641022896
Intel TensorFlow 2.14SSD-ResNet34 COCO 2017 (1200 x1200)Object Detectionbf3268.08 img/s0.089474168896
Intel PyTorch 2.1RNNT LibriSpeechSpeech Recognitionfp323.38 fps0.0043165132
Intel PyTorch 2.1RNNT LibriSpeechSpeech Recognitionbf1627.32 fps0.03285312364
Intel PyTorch 2.1RNNT LibriSpeechSpeech Recognitionbf3211.05 fps0.01321090832
Intel PyTorch 2.1MaskR-CNN COCO 2017Object Detectionfp323.76 img/s0.004518796112
Intel PyTorch 2.1MaskR-CNN COCO 2017Object Detectionbf1610.04 img/s0.011990064112
Intel PyTorch 2.1MaskR-CNN COCO 2017Object Detectionbf323.94 img/s0.004759719112
Intel PyTorch 2.1BERTLarge Wikipedia 2020/01/01 seq len=512Natural Language Processingfp323.76 sent/s0.00451879628
Intel PyTorch 1.13BERTLarge Wikipedia 2020/01/01 seq len=512Natural Language Processingbf1610.04 sent/s0.01199006456
Intel PyTorch 1.13BERTLarge Wikipedia 2020/01/01 seq len=512Natural Language Processingbf323.94 sent/s0.00475971956
Intel TensorFlow 2.14BERTLarge Wikipedia 2020/01/01 seq len=512Natural Language Processingfp324.28 sent/s0.00517258128
Intel TensorFlow 2.14BERTLarge Wikipedia 2020/01/01 seq len=512Natural Language Processingbf169.75 sent/s0.011582599128
Intel TensorFlow 2.14BERTLarge Wikipedia 2020/01/01 seq len=512Natural Language Processingbf324.79 sent/s0.005754722128
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationfp3212,072.19 sent/s11.5342000
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationbf1628757.83 sent/s28.8942000
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationbf3211, 995.37 sent/s11.7842000

Hardware and software configuration (measured October 24, 2023):

Deep learning configuration:

  • Hardware configuration for Intel® Xeon® Platinum 8480+ processor (formerly code named Sapphire Rapids): 2 sockets for inference, 1 socket for training, 56 cores, 350 watts, 1024GB 16 x 64GB DDR5 4800 MT/s memory, operating system CentOS* Stream 8. Using Intel® Advanced Matrix Extensions (Intel® AMX) int8 and bf16 with Intel® oneAPI Deep Neural Network Library (oneDNN) optimized kernels integrated into Intel® Extension for PyTorch*, Intel® Extension for TensorFlow*, and Intel® Distribution of OpenVINO™ toolkit. Measurements may vary. If the dataset is not listed, a synthetic dataset was used to measure performance.
  • If the dataset is not listed, a synthetic dataset was used to measure performance. Accuracy (if listed) was validated with the specified dataset.

Transfer learning configuration:

  • Hardware configuration for Intel® Xeon® Platinum 8480+ processor (formerly code named Sapphire Rapids): Use DLSA single node fine tuning, Vision Transfer Learning using single node, 56 cores, 350 watts, 16 x 64 GB DDR5 4800 memory, BIOS version EGSDREL1.SYS.8612.P03.2208120629, operating system: Ubuntu 22.04.1 LT, using Intel® Advanced Matrix Extensions (Intel® AMX) int8 and bf16 with Intel® oneAPI Deep Neural Network Library (oneDNN) v2.6 optimized kernels integrated into Intel® Extension for PyTorch* v1.12, and Intel® oneAPI Collective Communications Library v2021.5.2. Measurements and some software configurations may vary.

MLPerf* configuration:

  • Hardware configuration for MLPerf* Inference v3.1 measurements on Intel® Xeon® Platinum 8480+ processor (formerly code named Sapphire Rapids): 2 sockets for inference, 56 cores, 350 watts, 1024 GB 16 x 64 GB DDR5-4800 MT/s memory, operating system: CentOS* Stream 8. Using Intel® Advanced Matrix Extensions (Intel® AMX) int4, int8, and bf16 with Intel® oneAPI Deep Neural Network Library (oneDNN) optimized kernels integrated into Intel® Extension for PyTorch*. Measurements may vary. The model specifications and datasets used for MLPerf workloads are specified by MLCommons and viewable at MLPerf Inference: Datacenter Benchmark Suite Results