Framework VersionModelUsagePrecisionThroughputPerf/WattLatency(ms)Batch size
Intel PyTorch 2.1 DeepSpeedGPT-J 6B Token size 1024/128text-generation, Beam Search, Width=4int8  351
Intel PyTorch 2.1 DeepSpeedGPT-J 6B Token size 1024/128text-generation, Beam Search, Width=4int8173 tokens/s 92.58
Intel PyTorch 2.1 DeepSpeedGPT-J 6B Token size 1024/128text-generation, Beam Search, Width=4bf16  52.51
Intel PyTorch 2.1 DeepSpeedGPT-J 6B Token size 1024/128text-generation, Beam Search, Width=4bf16  98.58
Intel PyTorch 2.1 DeepSpeedLLaMA2-7B Token size 1024/128text-generation, Beam Search, Width=4int8  41.51
Intel PyTorch 2.1 DeepSpeedLLaMA2-7B Token size 1024/128text-generation, Beam Search, Width=4int8149.5 tokens/s 1078
Intel PyTorch 2.1 DeepSpeedLLaMA2-7B Token size 1024/128text-generation, Beam Search, Width=4bf16  59.51
Intel PyTorch 2.1 DeepSpeedLLaMA2-7B Token size 1024/128text-generation, Beam Search, Width=4bf16142.2 tokens /s  112.58
OpenVINO 2023.2LLaMA2-7b Token size 32/512GenAI_chatInt411.3 tokens/s 88.441
OpenVINO 2023.2LLaMA2-7b Token size 32/512GenAI_chatint813.5 tokens/s 73.741
OpenVINO 2023.2LLaMA2-7b Token size 32/512GenAI_chatfp3211.3 tokens/s 88.391
OpenVINO 2023.2LLaMA2-7b Token size 80/512GenAI_chatInt411.4 tokens/s 87.171
OpenVINO 2023.2LLaMA2-7b Token size 80/512GenAI_chatint813.6 tokens/s 73.091
OpenVINO 2023.2LLaMA2-7b Token size 80/512GenAI_chatfp3211.2 tokens/s 89.001
OpenVINO 2023.2LLaMA2-7b Token size 142/512GenAI_chatInt411.5 tokens/s 86.631
OpenVINO 2023.2LLaMA2-7b Token size 142/512GenAI_chatint813.3 tokens/s 75.151
OpenVINO 2023.2LLaMA2-7b Token size 142/512GenAI_chatfp3211.1 tokens/s 89.731
OpenVINO 2023.2Stable Diffusion 2.1, 20 Steps, 64 PromptsGenAI_text_imageint80.24 img/s 4,1601
OpenVINO 2023.2Stable Diffusion 2.1, 20 Steps, 64 PromptsGenAI_text_imagefp320.24 img/s 4,0801
Intel PyTorch 2.1ResNet50 v1.5Image RecognitionInt812,862.56 img/s13.23 1
Intel PyTorch 2.1ResNet50 v1.5Image RecognitionInt819,386.47 img/s19.21 64
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionbf168,211.8 img/s8.13 1
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionbf1610,187.87 img/s10.82 64
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionfp321,773.68 img/s1.74 1
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionfp321,703.77 img/s1.57 64
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionbf322,431.26 img/s2.40 1
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionbf3226,86.97 img/s2.67 64
Intel TensorFlow 2.14ResNet50 v1.5Image RecognitionInt89,726.18 img/s9.67 1
Intel TensorFlow 2.14ResNet50 v1.5Image RecognitionInt816,036.8 img/s17.01 32
Intel TensorFlow 2.14ResNet50 v1.5Image Recognitionbf166,782.09 img/s7.04 1
Intel TensorFlow 2.14ResNet50 v1.5Image Recognitionbf169,312.72 img/s9.40 32
Intel TensorFlow 2.14ResNet50 v1.5Image Recognitionfp321,560.99 img/s1.45 1
Intel TensorFlow 2.14ResNet50 v1.5Image Recognitionfp321,663.44 img/s1.57 32
Intel TensorFlow 2.14ResNet50 v1.5Image Recognitionbf322,013.88 img/s1.84 1
Intel TensorFlow 2.14ResNet50 v1.5Image Recognitionbf322,874.29 img/s2.73 32
OpenVINO 2023.2ResNet50 v1.5Image RecognitionInt818674.37 img/s26.68 1
OpenVINO 2023.2ResNet50 v1.5Image Recognitionbf1611537.06 img/s16.48 1
OpenVINO 2023.2ResNet50 v1.5Image Recognitionfp321721.58 img/s2.46 1
Intel PyTorch 2.1BERTLargeNatural Language Processingint8411.14 sent/s0.42 1
Intel PyTorch 2.1BERTLargeNatural Language Processingint8455.33 sent/s0.45 16
Intel PyTorch 2.1BERTLargeNatural Language Processingbf16243.89 sent/s0.24 1
Intel PyTorch 2.1BERTLargeNatural Language Processingbf16278.00 sent/s0.25 44
Intel PyTorch 2.1BERTLargeNatural Language Processingfp3244.56 sent/s0.04 1
Intel PyTorch 2.1BERTLargeNatural Language Processingfp3250.49 sent/s0.05 16
Intel PyTorch 2.1BERTLargeNatural Language Processingbf3298.49 sent/s0.09 1
Intel PyTorch 2.1BERTLargeNatural Language Processingbf3296.98 sent/s0.09 16
Intel TensorFlow 2.14BERTLargeNatural Language Processingint8323.58 sent/s0.32 1
Intel TensorFlow 2.14BERTLargeNatural Language Processingint8324.56 sent/s0.33 12
Intel TensorFlow 2.14BERTLargeNatural Language Processingbf16224.04 sent/s0.22 1
Intel TensorFlow 2.14BERTLargeNatural Language Processingbf16231.37 sent/s0.23 28
Intel TensorFlow 2.14BERTLargeNatural Language Processingfp3255.34 sent/s0.05 1
Intel TensorFlow 2.14BERTLargeNatural Language Processingfp3248.46 sent/s0.05 12
Intel TensorFlow 2.14BERTLargeNatural Language Processingbf32101.93 sent/s0.10 1
Intel TensorFlow 2.14BERTLargeNatural Language Processingbf3298.81 sent/s0.10 12
OpenVINO 2023.2BERTLargeNatural Language Processingint8373.6867 sent/s0.37 1
OpenVINO 2023.2BERTLargeNatural Language Processingint8388.25 sent/s0.39 32
OpenVINO 2023.2BERTLargeNatural Language Processingbf16244.25 sent/s0.24 1
OpenVINO 2023.2BERTLargeNatural Language Processingbf16281.79 sent/s0.27 40
OpenVINO 2023.2BERTLargeNatural Language Processingfp3257.16667 sent/s0.06 1
OpenVINO 2023.2BERTLargeNatural Language Processingfp3255.67 sent/s0.05 16
Intel PyTorch 2.1DLRM Criteo TerabyteRecommenderint823,444,587 rec/s23611.92 128
Intel PyTorch 2.1DLRM Criteo TerabyteRecommenderbf1613,223,343 rec/s12742.32 128
Intel PyTorch 2.1DLRM Criteo TerabyteRecommenderfp322,742,037 rec/s2615.42 128
Intel PyTorch 2.1DLRM Criteo TerabyteRecommenderbf326,760,005 rec/s6699.18 128
Intel PyTorch 2.1DistilBERTNatural Language Processingint86,380.26 sent/s6.80 1
Intel PyTorch 2.1DistilBERTNatural Language Processingint810,701.44 sent/s11.39 104
Intel PyTorch 2.1DistilBERTNatural Language Processingbf164,651.69 sent/s4.97 1
Intel PyTorch 2.1DistilBERTNatural Language Processingbf166,864.75 sent/s7.23 88
Intel PyTorch 2.1DistilBERTNatural Language Processingfp321,121.45 sent/s1.12 1
Intel PyTorch 2.1DistilBERTNatural Language Processingfp321,205.86 sent/s1.27 32
Intel PyTorch 2.1DistilBERTNatural Language Processingbf322,161.93 sent/s2.15 1
Intel PyTorch 2.1DistilBERTNatural Language Processingbf322,584.98 sent/s2.63 56
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationint877.94 sent/s0.07 1
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationint8334.65 sent/s0.31 448
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationbf1652 sent/s0.05 1
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationbf16367.07 sent/s0.35 448
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationfp321,099.6 sent/s26.53 1
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationfp32137.37 sent/s0.12 448
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationbf3224.86 sent/s0.02 1
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationbf32155.04 sent/s0.14 448
OpenVINO 2023.23D-UnetImage Segmentationint830.31 samples/s0.03 1
OpenVINO 2023.23D-UnetImage Segmentationint827.18333 samples/s0.02 6
OpenVINO 2023.23D-UnetImage Segmentationbf1615.67667 samples/s0.01 1
OpenVINO 2023.23D-UnetImage Segmentationbf163.18 samples/s0.00 7
OpenVINO 2023.23D-UnetImage Segmentationfp323.49 samples/s0.00 1
OpenVINO 2023.23D-UnetImage Segmentationfp3214.40 samples/s0.01 3
OpenVINO 2023.2SSD-ResNet34 COCO 2017 (1200 x1200)Object Detectionint8590.2267 img/s0.57 1
OpenVINO 2023.2SSD-ResNet34 COCO 2017 (1200 x1200)Object Detectionbf16297.79 img/s0.28 1
OpenVINO 2023.2SSD-ResNet34 COCO 2017 (1200 x1200)Object Detectionfp3236.92 img/s0.04 1
Intel PyTorch 2.1ResNeXt101 32x16d ImageNetImage Classificationint81,679.87 fps1.73 1
Intel PyTorch 2.1ResNeXt101 32x16d ImageNetImage Classificationint82,481.66 fps2.56 58
Intel PyTorch 2.1ResNeXt101 32x16d ImageNetImage Classificationbf16802.44 fps0.80 1
Intel PyTorch 2.1ResNeXt101 32x16d ImageNetImage Classificationbf161,175.18 fps1.10 72
Intel PyTorch 2.1ResNeXt101 32x16d ImageNetImage Classificationfp32186.33 fps0.19 1
Intel PyTorch 2.1ResNeXt101 32x16d ImageNetImage Classificationfp32202.33 fps0.19 40
Intel PyTorch 2.1ResNeXt101 32x16d ImageNetImage Classificationbf32279.07 fps0.28 1
Intel PyTorch 2.1ResNeXt101 32x16d ImageNetImage Classificationbf32320.62 fps0.29 58
OpenVINO 2023.2Yolo-v8nObject DetectionInt83,513.54 img/s  1
OpenVINO 2023.2Yolo-v8nObject Detectionbf163,632.55 img/s  1
OpenVINO 2023.2Yolov-8nObject Detectionfp321,249.91 img/s  1

 

Framework VersionModelUsagePrecisionTTT (minutes)AccurayBatch SizeRanks
Transformers 4.31, Intel Extension for Pytorch 2.0.1, PEFT 0.4.0GPT-J 6B (Glue MNLI dataset)Fine-tuning, Text-generationbf16184.2082.281
Transformers 4.34.1, Intel PyTorch 2.1.0, PEFT 0.5.0, Intel(r) oneCCL v2.1.0BioGPT 1.5B (PubMedQA dataset)Response generationbf1639.8079.488
Intel(r) Tensorflow 2.14, horovod 0.28, Open MPI 4.1.2, Python 3.10.0ResNet50 v1.50 (Colorectal histology dataset)Colorectal cancer detectionfp326.9894.13264
Intel(r) Tensorflow 2.14, horovod 0.28, Open MPI 4.1.2, Python 3.10.0ResNet50 v1.50 (Colorectal histology dataset)Colorectal cancer detectionbf164.0894.93264
Intel(r) Tensorflow 2.14, horovod 0.28, Open MPI 4.1.2, Python 3.10.0ResNet50 v1.50 (Colorectal histology dataset)Colorectal cancer detectionfp325.3494.132128
Intel(r) Tensorflow 2.14, horovod 0.28, Open MPI 4.1.2, Python 3.10.0ResNet50 v1.50 (Colorectal histology dataset)Colorectal cancer detectionbf162.9094.932128
Transformers 4.35.0, Intel PyTorch 2.0.100, Intel® oneCCL 2.0.100BERTLarge Uncased (IMDb dataset)Sentiment Analysisfp3247.9593.84644
Transformers 4.35.0, Intel PyTorch 2.0.100, Intel® oneCCL 2.0.100BERTLarge Uncased (IMDb dataset)Sentiment Analysisbf1615.9693.8644
Transformers 4.35.0, Intel PyTorch 2.0.100, Intel® oneCCL 2.0.100BERTLarge Uncased (GLUE SST2 dataset)Sentiment Analysisfp3210.4892.22564
Transformers 4.35.0, Intel PyTorch 2.0.100, Intel® oneCCL 2.0.100BERTLarge Uncased (GLUE SST2 dataset)Sentiment Analysisbf162.9392.092564

 

Framework VersionModel/DatasetUsagePrecisionThroughputPerf/WattBatch size
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionfp32175.29 img/s0.22128
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionbf16396.24 img/s0.52256
Intel PyTorch 2.1ResNet50 v1.5Image Recognitionbf32197.14 img/s0.25128
Intel TensorFlow 2.14ResNet50 v1.5 ImageNet (224 x224)Image Recognitionfp32145.93 img/s0.19512
Intel TensorFlow 2.14ResNet50 v1.5 ImageNet (224 x224)Image Recognitionbf16354.45 img/s0.46512
Intel TensorFlow 2.14ResNet50 v1.5 ImageNet (224 x224)Image Recognitionbf32166.37 img/s0.21512
Intel PyTorch 2.1DLRM Criteo Terabyte, QUAD ModeRecommenderfp32290,772.24 rec/s359.8332,768
Intel PyTorch 2.1DLRM Criteo Terabyte, QUAD ModeRecommenderbf16862,286.46 rec/s 1,055.35 32,768
Intel PyTorch 2.1DLRM Criteo Terabyte, QUAD ModeRecommenderbf32417,584.33 rec/s504.2932,768
Intel TensorFlow 2.14SSD-ResNet34 COCO 2017 (1200 x1200)Object Detectionfp3261.25 img/s0.09448
Intel TensorFlow 2.14SSD-ResNet34 COCO 2017 (1200 x1200)Object Detectionbf16219.77 img/s0.31448
Intel TensorFlow 2.14SSD-ResNet34 COCO 2017 (1200 x1200)Object Detectionbf3283.44 img/s0.11448
Intel PyTorch 2.1RNNT LibriSpeechSpeech Recognitionfp324.35 fps0.0164
Intel PyTorch 2.1RNNT LibriSpeechSpeech Recognitionbf1635.13 fps0.0464
Intel PyTorch 2.1RNNT LibriSpeechSpeech Recognitionbf3213.65 fps0.0232
Intel PyTorch 2.1MaskR-CNN COCO 2017Object Detectionfp324.8 img/s0.01128
Intel PyTorch 2.1MaskR-CNN COCO 2017Object Detectionbf1616.43 img/s0.02128
Intel PyTorch 2.1MaskR-CNN COCO 2017Object Detectionbf325.37 img/s0.0196
Intel PyTorch 2.1BERTLarge Wikipedia 2020/01/01 seq len=512Natural Language Processingfp324.41 sent/s0.0164
Intel PyTorch 2.1BERTLarge Wikipedia 2020/01/01 seq len=512Natural Language Processingbf1612.53 sent/s0.0228
Intel PyTorch 2.1BERTLarge Wikipedia 2020/01/01 seq len=512Natural Language Processingbf325.52 sent/s0.0156
Intel TensorFlow 2.14BERTLarge Wikipedia 2020/01/01 seq len=512Natural Language Processingfp325.38 sent/s0.0164
Intel TensorFlow 2.14BERTLarge Wikipedia 2020/01/01 seq len=512Natural Language Processingbf1611.74 sent/s0.0264
Intel TensorFlow 2.14BERTLarge Wikipedia 2020/01/01 seq len=512Natural Language Processingbf326.07 sent/s0.0164
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationfp3215,671.55 sent/s16.9542,000
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationbf1640,653.1 sent/s43.7742,000
Intel TensorFlow 2.14Transformer MLPerfLanguage Translationbf3215,316.08 sent/s15.4442,000

Hardware and software configuration (measured October 24, 2023):

Deep Learning configuration:

  • Hardware configuration for Intel® Xeon® Platinum 8592+ processor (code named Emerald Rapids): 2 sockets for inference, 1 socket for training, 64 cores, 350 watts, 1024GB 16 x 64GB DDR5 5600 MT/s memory, operating system CentOS* Stream 9. Using Intel® Advanced Matrix Extensions (Intel® AMX) int8 and bf16 with Intel® oneAPI Deep Neural Network Library (oneDNN) optimized kernels integrated into Intel® Extension for PyTorch*, Intel® Extension for TensorFlow*, and Intel® Distribution of OpenVINO™ toolkit. Measurements may vary. If the dataset is not listed, a synthetic dataset was used to measure performance.

Transfer Learning configuration:

  • Hardware configuration for Intel® Xeon® Platinum 8592+ processor (code named Emerald Rapids): 2 sockets, 64 cores, 350 watts, 16 x 64 GB DDR5 5600 memory, BIOS version 3B05.TEL4P1, operating system: CentOS stream 8, using Intel® Advanced Matrix Extensions (Intel® AMX) int8 and bf16 with Intel® oneAPI Deep Neural Network Library (oneDNN) v2.6.0 optimized kernels integrated into Intel® Extension for PyTorch* v2.0.1, Intel® Extension for TensorFlow* v2.14, and Intel® oneAPI Data Analytics Library (oneDAL) 2023.1 optimized kernels integrated into Intel® Extension for Scikit-learn* v2023.1. Intel® Distribution of Modin* v2.1.1, and Intel oneAPI Math Kernel Library (oneMKL) v2023.1. Measurements may vary.