Performance Data for Intel® AI Data Center Products
Find the latest AI benchmark performance data for Intel Data Center products, including detailed hardware and software configurations.
Pretrained models, sample scripts, best practices, and tutorials
- Intel® Developer Cloud
- Intel® AI Reference Models and Jupyter Notebooks*
- AI-Optimized CPU Containers from Intel
- AI-Optimized GPU Containers from Intel
- Open Model Zoo for OpenVINO™ toolkit
- Jupyter Notebook tutorials for OpenVINO™
- AI Performance Debugging on Intel® CPUs
Measurements were taken using:
- PyTorch* Optimizations from Intel
- TensorFlow* Optimizations from Intel
- Intel® Distribution of OpenVINO™ Toolkit
4th Generation Intel® Xeon® Scalable Processors
Intel® Xeon® Platinum 8480+ Processor (56 Cores)
Deep Learning Inference
Framework Version | Model | Usage | Precision | Throughput | Perf/Watt | Latency(ms) | Batch size | Config* |
---|---|---|---|---|---|---|---|---|
Intel PyTorch 2.1 DeepSpeed | GPT-J 6B Token size 1024/128 | text-generation, Beam Search, Width=4 | int8 | 40 | 1 | 1 instance per socket | ||
Intel PyTorch 2.1 DeepSpeed | GPT-J 6B Token size 1024/128 | text-generation, Beam Search, Width=4 | int8 | 130.4 tokens/s | 92 | 6 | 1 instance per socket | |
Intel PyTorch 2.1 DeepSpeed | GPT-J 6B Token size 1024/128 | text-generation, Beam Search, Width=4 | bf16 | 59.5 | 1 | 1 instance per socket | ||
Intel PyTorch 2.1 DeepSpeed | GPT-J 6B Token size 1024/128 | text-generation, Beam Search, Width=4 | bf16 | 125 tokens/s | 96 | 6 | 1 instance per socket | |
MLPerf Inference v3.1 | GPT-J (offline, 99.0% acc) | Large Language Model | int8 | 2.05 samp/s | 7 | 4 cores per instance | ||
Intel PyTorch 2.1 DeepSpeed | LLaMA2-7B Token size 1024/128 | text-generation, Beam Search, Width=4 | int8 | 47 | 1 | 1 instance per socket | ||
Intel PyTorch 2.1 DeepSpeed | LLaMA2-7B Token size 1024/128 | text-generation, Beam Search, Width=4 | int8 | 111.6 tokens/s | 107.5 | 6 | 1 instance per socket | |
Intel PyTorch 2.1 DeepSpeed | LLaMA2-7B Token size 1024/128 | text-generation, Beam Search, Width=4 | bf16 | 68 | 1 | 1 instance per socket | ||
Intel PyTorch 2.1 DeepSpeed | LLaMA2-7B Token size 1024/128 | text-generation, Beam Search, Width=4 | bf16 | 109.1 tokens/s | 110 | 6 | 1 instance per socket | |
MLPerf Inference v3.1 | ResNet50 v1.5 (offline) | Image Recognition | int8 | 20,565.5 samp/s | 256 | 1 core per instance | ||
Intel PyTorch 2.1 | ResNet50 v1.5 | Image Recognition | int8 | 10,215.7 img/s | 9.98 | 1 | 4 cores per instance | |
Intel PyTorch 2.1 | ResNet50 v1.5 | Image Recognition | int8 | 13,862.96 img/s | 14.09 | 116 | 1 instance per socket | |
Intel PyTorch 2.1 | ResNet50 v1.5 | Image Recognition | bf16 | 6,210.69 img/s | 6.13 | 1 | 4 cores per instance | |
Intel PyTorch 2.1 | ResNet50 v1.5 | Image Recognition | bf16 | 7,295.63 img/s | 7.33 | 116 | 1 instance per socket | |
Intel PyTorch 2.1 | ResNet50 v1.5 | Image Recognition | fp32 | 1,319.52 img/s | 1.27 | 1 | 4 cores per instance | |
Intel PyTorch 2.1 | ResNet50 v1.5 | Image Recognition | fp32 | 1,360.05 img/s | 1.28 | 116 | 1 instance per socket | |
Intel PyTorch 2.1 | ResNet50 v1.5 | Image Recognition | bf32 | 1,659.37 img/s | 1.65 | 1 | 4 cores per instance | |
Intel PyTorch 2.1 | ResNet50 v1.5 | Image Recognition | bf32 | 1,985.26 img/s | 2.02 | 116 | 1 instance per socket | |
Intel TensorFlow 2.14 | ResNet50 v1.5 | Image Recognition | int8 | 7,440.61 img/s | 7.70 | 1 | 4 cores per instance | |
Intel TensorFlow 2.14 | ResNet50 v1.5 | Image Recognition | int8 | 12,345.54 img/s | 11.80 | 116 | 1 instance per socket | |
Intel TensorFlow 2.14 | ResNet50 v1.5 | Image Recognition | bf16 | 5,053.76 img/s | 5.01 | 1 | 4 cores per instance | |
Intel TensorFlow 2.14 | ResNet50 v1.5 | Image Recognition | bf16 | 6,704.17 img/s | 6.34 | 116 | 1 instance per socket | |
Intel TensorFlow 2.14 | ResNet50 v1.5 | Image Recognition | fp32 | 1,282.77 img/s | 1.17 | 1 | 4 cores per instance | |
Intel TensorFlow 2.14 | ResNet50 v1.5 | Image Recognition | fp32 | 1,342.91 img/s | 1.27 | 116 | 1 instance per socket | |
Intel TensorFlow 2.14 | ResNet50 v1.5 | Image Recognition | bf32 | 1,529.49 img/s | 1.41 | 1 | 4 cores per instance | |
Intel TensorFlow 2.14 | ResNet50 v1.5 | Image Recognition | bf32 | 2,017.54 img/s | 1.89 | 116 | 1 instance per socket | |
OpenVINO 2023.2 | ResNet50 v1.5 | Image Recognition | int8 | 8,819.657 img/s | 8.81 | 1 | 4 cores per instance | |
OpenVINO 2023.2 | ResNet50 v1.5 | Image Recognition | bf16 | 5,915.793 img/s | 5.82 | 1 | 4 cores per instance | |
OpenVINO 2023.2 | ResNet50 v1.5 | Image Recognition | fp32 | 1,281.337 img/s | 1.25 | 1 | 4 cores per instance | |
MLPerf Inference v3.1 | BERT-Large (offline, 99.0% acc) | Natural Language Processing | int8 | 1,357.33 samp/s | 1,300 | 4 cores per instance | ||
Intel PyTorch 2.1 | BERTLarge | Natural Language Processing | int8 | 335.1 sent/s | 0.35 | 1 | 4 cores per instance | |
Intel PyTorch 2.1 | BERTLarge | Natural Language Processing | int8 | 378.73 sent/s | 0.36 | 56 | 1 instance per socket | |
Intel PyTorch 2.1 | BERTLarge | Natural Language Processing | bf16 | 204.52 sent/s | 0.21 | 1 | 4 cores per instance | |
Intel PyTorch 2.1 | BERTLarge | Natural Language Processing | bf16 | 201.44 sent/s | 0.21 | 16 | 1 instance per socket | |
Intel PyTorch 2.1 | BERTLarge | Natural Language Processing | fp32 | 35.25 sent/s | 0.03 | 1 | 4 cores per instance | |
Intel PyTorch 2.1 | BERTLarge | Natural Language Processing | fp32 | 41.05 sent/s | 0.04 | 56 | 1 instance per socket | |
Intel PyTorch 2.1 | BERTLarge | Natural Language Processing | bf32 | 72.42 sent/s | 0.07 | 1 | 4 cores per instance | |
Intel PyTorch 2.1 | BERTLarge | Natural Language Processing | bf32 | 71.63 sent/s | 0.07 | 16 | 1 instance per socket | |
Intel TensorFlow 2.14 | BERTLarge | Natural Language Processing | int8 | 253.27 sent/s | 0.24 | 1 | 4 cores per instance | |
Intel TensorFlow 2.14 | BERTLarge | Natural Language Processing | int8 | 239.89 sent/s | 0.25 | 16 | 1 instance per socket | |
Intel TensorFlow 2.14 | BERTLarge | Natural Language Processing | bf16 | 181.02 sent/s | 0.18 | 1 | 4 cores per instance | |
Intel TensorFlow 2.14 | BERTLarge | Natural Language Processing | bf16 | 184.06 sent/s | 0.17 | 128 | 1 instance per socket | |
Intel TensorFlow 2.14 | BERTLarge | Natural Language Processing | fp32 | 44.73 sent/s | 0.04 | 1 | 4 cores per instance | |
Intel TensorFlow 2.14 | BERTLarge | Natural Language Processing | fp32 | 38.58 sent/s | 0.04 | 16 | 1 instance per socket | |
Intel TensorFlow 2.14 | BERTLarge | Natural Language Processing | bf32 | 72.78 sent/s | 0.07 | 1 | 4 cores per instance | |
Intel TensorFlow 2.14 | BERTLarge | Natural Language Processing | bf32 | 71.77 sent/s | 0.07 | 16 | 1 instance per socket | |
OpenVINO 2023.2 | BERTLarge | Natural Language Processing | int8 | 298.44 sent/s | 0.30 | 1 | 4 cores per instance | |
OpenVINO 2023.2 | BERTLarge | Natural Language Processing | int8 | 285.68 sent/s | 0.28 | 48 | 1 instance per socket | |
OpenVINO 2023.2 | BERTLarge | Natural Language Processing | bf16 | 202.48 sent/s | 0.20 | 1 | 4 cores per instance | |
OpenVINO 2023.2 | BERTLarge | Natural Language Processing | bf16 | 191.2533 sent/s | 0.19 | 32 | 1 instance per socket | |
OpenVINO 2023.2 | BERTLarge | Natural Language Processing | fp32 | 47.33667 sent/s | 0.05 | 1 | 4 cores per instance | |
OpenVINO 2023.2 | BERTLarge | Natural Language Processing | fp32 | 44.23333 sent/s | 0.04 | 48 | 1 instance per socket | |
MLPerf Inference v3.1 | DLRM-v2 (offline, 99.0% acc) | Recommender | int8 | 5,367.77 samp/s | 300 | 1 core per instance | ||
Intel PyTorch 2.1 | DLRM Criteo Terabyte | Recommender | int8 | 23,444,587 rec/s | 23611.92 | 128 | 1 instance per socket | |
Intel PyTorch 2.1 | DLRM Criteo Terabyte | Recommender | bf16 | 10,646,560 rec/s | 10238.88 | 128 | 1 instance per socket | |
Intel PyTorch 2.1 | DLRM Criteo Terabyte | Recommender | fp32 | 2,278,228 rec/s | 2220.37 | 128 | 1 instance per socket | |
Intel PyTorch 2.1 | DLRM Criteo Terabyte | Recommender | bf32 | 4,530,200 rec/s | 4427.38 | 128 | 1 instance per socket | |
Intel PyTorch 2.1 | DistilBERT | Natural Language Processing | int8 | 4,726.15 sent/s | 4.94 | 1 | 4 cores per instance | |
Intel PyTorch 2.1 | DistilBERT | Natural Language Processing | int8 | 7,759.25 sent/s | 8.42 | 168 | 1 instance per socket | |
Intel PyTorch 2.1 | DistilBERT | Natural Language Processing | bf16 | 3,306.46 sent/s | 3.35 | 1 | 4 cores per instance | |
Intel PyTorch 2.1 | DistilBERT | Natural Language Processing | bf16 | 5,057.47 sent/s | 5.50 | 120 | 1 instance per socket | |
Intel PyTorch 2.1 | DistilBERT | Natural Language Processing | fp32 | 900.58 sent/s | 0.85 | 1 | 4 cores per instance | |
Intel PyTorch 2.1 | DistilBERT | Natural Language Processing | fp32 | 1,007.05 sent/s | 1.04 | 56 | 1 instance per socket | |
Intel PyTorch 2.1 | DistilBERT | Natural Language Processing | bf32 | 1,513.66 sent/s | 1.49 | 1 | 4 cores per instance | |
Intel PyTorch 2.1 | DistilBERT | Natural Language Processing | bf32 | 1,926.1 sent/s | 1.77 | 288 | 1 instance per socket | |
Intel TensorFlow 2.14 | Transformer MLPerf | Language Translation | int8 | 61.03 sent/s | 0.06 | 1 | 4 cores per instance | |
Intel TensorFlow 2.14 | Transformer MLPerf | Language Translation | int8 | 245.66 sent/s | 0.24 | 448 | 1 instance per socket | |
Intel TensorFlow 2.14 | Transformer MLPerf | Language Translation | bf16 | 41.44 sent/s | 0.04 | 1 | 4 cores per instance | |
Intel TensorFlow 2.14 | Transformer MLPerf | Language Translation | bf16 | 278.81 sent/s | 0.28 | 448 | 1 instance per socket | |
Intel TensorFlow 2.14 | Transformer MLPerf | Language Translation | fp32 | 20.27 sent/s | 0.02 | 1 | 4 cores per instance | |
Intel TensorFlow 2.14 | Transformer MLPerf | Language Translation | fp32 | 102.48 sent/s | 0.10 | 448 | 1 instance per socket | |
Intel TensorFlow 2.14 | Transformer MLPerf | Language Translation | bf32 | 20.28 sent/s | 0.02 | 1 | 4 cores per instance | |
Intel TensorFlow 2.14 | Transformer MLPerf | Language Translation | bf32 | 114.08 sent/s | 0.11 | 448 | 1 instance per socket | |
OpenVINO 2023.2 | 3D-Unet | Image Segmentation | int8 | 24.68333 samp/s | 0.02 | 1 | 4 cores per instance | |
OpenVINO 2023.2 | 3D-Unet | Image Segmentation | int8 | 21.85667 samp/s | 0.02 | 6 | 1 instance per socket | |
OpenVINO 2023.2 | 3D-Unet | Image Segmentation | bf16 | 13.05333 samp/s | 0.01 | 1 | 4 cores per instance | |
OpenVINO 2023.2 | 3D-Unet | Image Segmentation | bf16 | 11.87 samp/s | 0.01 | 6 | 1 instance per socket | |
OpenVINO 2023.2 | 3D-Unet | Image Segmentation | fp32 | 2.883333 samp/s | 0.00 | 1 | 4 cores per instance | |
OpenVINO 2023.2 | 3D-Unet | Image Segmentation | fp32 | 2.62 samp/s | 0.00 | 6 | 1 instance per socket | |
OpenVINO 2023.2 | SSD-ResNet34 COCO 2017 (1200 x1200) | Object Detection | int8 | 459.3633 img/s | 0.44 | 1 | 4 cores per instance | |
OpenVINO 2023.2 | SSD-ResNet34 COCO 2017 (1200 x1200) | Object Detection | bf16 | 218.4133 img/s | 0.20 | 1 | 4 cores per instance | |
OpenVINO 2023.2 | SSD-ResNet34 COCO 2017 (1200 x1200) | Object Detection | fp32 | 31.17333 img/s | 0.03 | 1 | 4 cores per instance | |
Intel PyTorch 2.1 | ResNeXt101 32x16d ImageNet | Image Classification | int8 | 1289.95 fps | 1.35 | 1 | 4 cores per instance | |
Intel PyTorch 2.1 | ResNeXt101 32x16d ImageNet | Image Classification | int8 | 1923.77 fps | 1.83 | 116 | 1 instance per socket | |
Intel PyTorch 2.1 | ResNeXt101 32x16d ImageNet | Image Classification | bf16 | 648.58 fps | 0.66 | 1 | 4 cores per instance | |
Intel PyTorch 2.1 | ResNeXt101 32x16d ImageNet | Image Classification | bf16 | 867.05 fps | 0.87 | 64 | 1 instance per socket | |
Intel PyTorch 2.1 | ResNeXt101 32x16d ImageNet | Image Classification | fp32 | 151.29 fps | 0.14 | 1 | 4 cores per instance | |
Intel PyTorch 2.1 | ResNeXt101 32x16d ImageNet | Image Classification | fp32 | 160.93 fps | 0.15 | 64 | 1 instance per socket | |
Intel PyTorch 2.1 | ResNeXt101 32x16d ImageNet | Image Classification | bf32 | 215.11 fps | 0.21 | 1 | 4 cores per instance | |
Intel PyTorch 2.1 | ResNeXt101 32x16d ImageNet | Image Classification | bf32 | 241.98 fps | 0.22 | 116 | 1 instance per socket | |
MLPerf Inference v3.1 | RetinaNet (offline) | Object Detection | int8 | 284.75 samp/s | 2 | 4 cores per instance | ||
MLPerf Inference v3.1 | RNN-T (offline) | Speech-to-text | int8+bf16 | 5,782.18 samp/s | 256 | 4 cores per instance |
Training
MLPerf* Training
Framework Version | Model/Dataset | Usage | Precision | Sockets / Nodes | Total Time to Train (min) | Throughput | Batch Size |
---|---|---|---|---|---|---|---|
MLPerf Training v3.1 | BERT-Large (seq len=512) / Wikipedia 2020/01/01 | Natural Language Processing | bf16 | 32 Sockets / 16 Nodes | 47.93 | 3,072 | |
MLPerf Training v3.1 | DLRM-v2 / Criteo 4TB multi-hot | Recommendation | bf16 | 8 Sockets / 4 Nodes | 227.14 | 65,536 | |
MLPerf Training v3.1 | ResNet-50 / ImageNet | Image Classification | bf16 | 32 Sockets / 16 Nodes | 88.56 | 8,584.5 img/s | 3,264 |
MLPerf Training v3.1 | RetinaNet / Open Images | Object Detection | bf16 | 32 Sockets / 16 Nodes | 232.4 | 351.2 img/s | 256 |
Transfer Learning / Fine Tuning
Framework Version | Model | Usage | Precision | TTT (minutes) | Accuray | Batch Size | Ranks |
---|---|---|---|---|---|---|---|
Transformers 4.31, Intel Extension for Pytorch 2.0.1, PEFT 0.4.0 | GPT-J 6B (Glue MNLI dataset) | Fine-turning, Text generation task | bf16 | 230.40 | 81.6 | 8 | 1 |
Transformers 4.34.1, Intel PyTorch 2.1.0, PEFT 0.5.0, Intel(r) oneCCL v2.1.0 | BioGPT (1.5 billion parameters) (PubMedQA dataset) | Fine-turning, Response generation | bf16 | 48.70 | 79.4 | 8 | 8 |
Intel(r) Tensorflow 2.14, horovod 0.28, Open MPI 4.1.2, Python 3.10.0 | ResNet50 v1.50 (Colorectal histology dataset) | Fine-turning, Colorectal cancer detection | fp32 | 8.83 | 94.3 | 32 | 64 |
Intel(r) Tensorflow 2.14, horovod 0.28, Open MPI 4.1.2, Python 3.10.0 | ResNet50 v1.50 (Colorectal histology dataset) | Fine-turning, Colorectal cancer detection | bf16 | 4.65 | 94.3 | 32 | 64 |
Intel(r) Tensorflow 2.14, horovod 0.28, Open MPI 4.1.2, Python 3.10.0 | ResNet50 v1.50 (Colorectal histology dataset) | Fine-turning, Colorectal cancer detection | fp32 | 6.04 | 93.8 | 32 | 128 |
Intel(r) Tensorflow 2.14, horovod 0.28, Open MPI 4.1.2, Python 3.10.0 | ResNet50 v1.50 (Colorectal histology dataset) | Fine-turning, Colorectal cancer detection | bf16 | 4.02 | 94.6 | 32 | 128 |
Transformers 4.35.0, Intel PyTorch 2.0.100, Intel® oneCCL 2.0.100 | BERTLarge Uncased (IMDb dataset) | Fine-turning, Sentiment Analysis | fp32 | 61.72 | 93.59 | 64 | 4 |
Transformers 4.35.0, Intel PyTorch 2.0.100, Intel® oneCCL 2.0.100 | BERTLarge Uncased (IMDb dataset) | Fine-turning, Sentiment Analysis | bf16 | 18.86 | 93.88 | 64 | 4 |
Transformers 4.35.0, Intel PyTorch 2.0.100, Intel® oneCCL 2.0.100 | BERTLarge Uncased (GLUE SST2 dataset) | Fine-turning, Sentiment Analysis | fp32 | 14.06 | 92.2 | 256 | 4 |
Transformers 4.35.0, Intel PyTorch 2.0.100, Intel® oneCCL 2.0.100 | BERTLarge Uncased (GLUE SST2 dataset) | Fine-turning, Sentiment Analysis | bf16 | 3.68 | 92.09 | 256 | 4 |
Training Throughput
Framework Version | Model/Dataset | Usage | Precision | Throughput | Perf/Watt | Batch size |
---|---|---|---|---|---|---|
Intel PyTorch 2.1 | ResNet50 v1.5 | Image Recognition | fp32 | 129.97 img/s | 0.161257103 | 128 |
Intel PyTorch 2.1 | ResNet50 v1.5 | Image Recognition | bf16 | 327.96 img/s | 0.420294498 | 128 |
Intel PyTorch 2.1 | ResNet50 v1.5 | Image Recognition | bf32 | 146.18 img/s | 0.180068983 | 128 |
Intel TensorFlow 2.14 | ResNet50 v1.5 ImageNet (224 x224) | Image Recognition | fp32 | 137.36 img/s | 0.163106335 | 1024 |
Intel TensorFlow 2.14 | ResNet50 v1.5 ImageNet (224 x224) | Image Recognition | bf16 | 317.83 img/s | 0.377479275 | 1024 |
Intel TensorFlow 2.14 | ResNet50 v1.5 ImageNet (224 x224) | Image Recognition | bf32 | 152 img/s | 0.180806014 | 1024 |
Intel PyTorch 2.1 | DLRM Criteo Terabyte | Recommender | fp32 | 265,503.91 rec/s | 323.9907136 | 32768 |
Intel PyTorch 2.1 | DLRM Criteo Terabyte | Recommender | bf16 | 783,058.09 rec/s | 980.36669 | 32768 |
Intel PyTorch 2.1 | DLRM Criteo Terabyte | Recommender | bf32 | 369,848.15 rec/s | 447.8448004 | 32768 |
Intel TensorFlow 2.14 | SSD-ResNet34 COCO 2017 (1200 x1200) | Object Detection | fp32 | 52.49 img/s | 0.069833963 | 896 |
Intel TensorFlow 2.14 | SSD-ResNet34 COCO 2017 (1200 x1200) | Object Detection | bf16 | 190.53 img/s | 0.251641022 | 896 |
Intel TensorFlow 2.14 | SSD-ResNet34 COCO 2017 (1200 x1200) | Object Detection | bf32 | 68.08 img/s | 0.089474168 | 896 |
Intel PyTorch 2.1 | RNNT LibriSpeech | Speech Recognition | fp32 | 3.38 fps | 0.00431651 | 32 |
Intel PyTorch 2.1 | RNNT LibriSpeech | Speech Recognition | bf16 | 27.32 fps | 0.032853123 | 64 |
Intel PyTorch 2.1 | RNNT LibriSpeech | Speech Recognition | bf32 | 11.05 fps | 0.013210908 | 32 |
Intel PyTorch 2.1 | MaskR-CNN COCO 2017 | Object Detection | fp32 | 3.76 img/s | 0.004518796 | 112 |
Intel PyTorch 2.1 | MaskR-CNN COCO 2017 | Object Detection | bf16 | 10.04 img/s | 0.011990064 | 112 |
Intel PyTorch 2.1 | MaskR-CNN COCO 2017 | Object Detection | bf32 | 3.94 img/s | 0.004759719 | 112 |
Intel PyTorch 2.1 | BERTLarge Wikipedia 2020/01/01 seq len=512 | Natural Language Processing | fp32 | 3.76 sent/s | 0.004518796 | 28 |
Intel PyTorch 1.13 | BERTLarge Wikipedia 2020/01/01 seq len=512 | Natural Language Processing | bf16 | 10.04 sent/s | 0.011990064 | 56 |
Intel PyTorch 1.13 | BERTLarge Wikipedia 2020/01/01 seq len=512 | Natural Language Processing | bf32 | 3.94 sent/s | 0.004759719 | 56 |
Intel TensorFlow 2.14 | BERTLarge Wikipedia 2020/01/01 seq len=512 | Natural Language Processing | fp32 | 4.28 sent/s | 0.00517258 | 128 |
Intel TensorFlow 2.14 | BERTLarge Wikipedia 2020/01/01 seq len=512 | Natural Language Processing | bf16 | 9.75 sent/s | 0.011582599 | 128 |
Intel TensorFlow 2.14 | BERTLarge Wikipedia 2020/01/01 seq len=512 | Natural Language Processing | bf32 | 4.79 sent/s | 0.005754722 | 128 |
Intel TensorFlow 2.14 | Transformer MLPerf | Language Translation | fp32 | 12,072.19 sent/s | 11.53 | 42000 |
Intel TensorFlow 2.14 | Transformer MLPerf | Language Translation | bf16 | 28757.83 sent/s | 28.89 | 42000 |
Intel TensorFlow 2.14 | Transformer MLPerf | Language Translation | bf32 | 11, 995.37 sent/s | 11.78 | 42000 |
Hardware and software configuration (measured October 24, 2023):
Deep learning configuration:
- Hardware configuration for Intel® Xeon® Platinum 8480+ processor (formerly code named Sapphire Rapids): 2 sockets for inference, 1 socket for training, 56 cores, 350 watts, 1024GB 16 x 64GB DDR5 4800 MT/s memory, operating system CentOS* Stream 8. Using Intel® Advanced Matrix Extensions (Intel® AMX) int8 and bf16 with Intel® oneAPI Deep Neural Network Library (oneDNN) optimized kernels integrated into Intel® Extension for PyTorch*, Intel® Extension for TensorFlow*, and Intel® Distribution of OpenVINO™ toolkit. Measurements may vary. If the dataset is not listed, a synthetic dataset was used to measure performance.
- If the dataset is not listed, a synthetic dataset was used to measure performance. Accuracy (if listed) was validated with the specified dataset.
Transfer learning configuration:
- Hardware configuration for Intel® Xeon® Platinum 8480+ processor (formerly code named Sapphire Rapids): Use DLSA single node fine tuning, Vision Transfer Learning using single node, 56 cores, 350 watts, 16 x 64 GB DDR5 4800 memory, BIOS version EGSDREL1.SYS.8612.P03.2208120629, operating system: Ubuntu 22.04.1 LT, using Intel® Advanced Matrix Extensions (Intel® AMX) int8 and bf16 with Intel® oneAPI Deep Neural Network Library (oneDNN) v2.6 optimized kernels integrated into Intel® Extension for PyTorch* v1.12, and Intel® oneAPI Collective Communications Library v2021.5.2. Measurements and some software configurations may vary.
MLPerf* configuration:
- Hardware configuration for MLPerf* Inference v3.1 measurements on Intel® Xeon® Platinum 8480+ processor (formerly code named Sapphire Rapids): 2 sockets for inference, 56 cores, 350 watts, 1024 GB 16 x 64 GB DDR5-4800 MT/s memory, operating system: CentOS* Stream 8. Using Intel® Advanced Matrix Extensions (Intel® AMX) int4, int8, and bf16 with Intel® oneAPI Deep Neural Network Library (oneDNN) optimized kernels integrated into Intel® Extension for PyTorch*. Measurements may vary. The model specifications and datasets used for MLPerf workloads are specified by MLCommons and viewable at MLPerf Inference: Datacenter Benchmark Suite Results.