Product and Performance Information
1Up to 1.93x higher AI training performance with a 3rd Generation Intel Xeon Scalable processor supporting Intel DL Boost with BF16 vs. a prior-generation processor with ResNet-50 throughput for image classification.
New configuration: 1 node, 4 x 3rd Generation Intel Xeon Platinum 8380H processor (pre-production 28 cores, 250 W) with 384 GB total memory (24 x 16 GB, 3,200 GHz), 800 GB Intel SSD drive, ResNet-50 v1.5, ucode 0x700001b, Intel Hyper-Threading Technology (Intel HT Technology) on, Intel Turbo Boost Technology on, and running Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic. Throughput:
https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642769358b388d8f615ded9c213f10c99a, Model Zoo:
https://github.com/IntelAI/models -b v1.6.1, ImageNet dataset, oneDNN 1.4, BF16, BS=512, tested by Intel on 5/18/2020.
Baseline: 1 node, 4 x Intel Xeon Platinum 8280 processor with 768 GB total memory (24 x 32 GB, 2,933 GHz), 800 GB Intel SSD, ucode 0x4002f00, Intel HT Technology on, Intel Turbo Boost Technology on, with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, ResNet-50 v1.5. Throughput:
https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388d8f615ded0c213f10c99a, Model Zoo:
https://github.com/intelai/models -b v1.6.1, ImageNet dataset, oneDNN 1.4, FP32, BS=512, tested by Intel on 5/18/2020.
2Up to 1.87x higher AI inference performance with a 3rd Generation Intel Xeon Scalable processors supporting Intel DL Boost with BF16 compared to prior-generation processors using FP32 on ResNet-50 throughput for image classification.
New configuration: 1 node, 4 x 3rd Generation Intel Xeon Platinum 8380H processor (pre-production, 28 cores, 250 W) with 384 GB total memory (24 x 16 GB, 3,200 GHz), 800 GB Intel SSD, ucode 0x700001b, Intel HT Technology on, Intel Turbo Boost Technology on with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, ResNet-50 v1.5. Throughput:
https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388e8r615ded0c213f10c99a, Model Zoo:
https://github.com/IntelAI/models -b v1.6.1, ImageNet dataset, oneDNN 1.4, BF16, BS=56, 5 instances, 28 cores/instance, tested by Intel on 5/18/2020.
Baseline: 1 node, 4 x Intel Xeon Platinum 8280 processors with 768 GB total memory (24 x 32 GB, 2,933 GHz), 800 GB Intel SSD, ucode 0x4002f00, Intel HT Technology on, Intel Turbo Boost Technology on, with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, ResNet-50 v1.5. Throughput:
https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388d8f615ded0c213f10c99a, Model Zoo:
https://github.com/IntelAI/models -b v1.6.1, ImageNet dataset, oneDNN 1.5, FP32, BS=56, 4 instances, 28 cores/instance, tested by Intel on 5/18/2020.
3Up to 1.7x more AI training performance with a 3rd Generation Intel Xeon Scalable processor supporting Intel DL Boost with BF16 vs. a prior-generation processor on BERT throughput for natural language processing.
New configuration: 1 node, 4 x 3rd Generation Intel Xeon Platinum 8380H processor (pre-production, 28 cores, 250 W) with 384 GB total memory (24 x 16 GB, 3,200 GHz), 800 GB Intel SSD, ucode 0x700001b, Intel HT Technology on, Intel Turbo Boost Technology on with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, BERT-Large (QA). Throughput:
https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388e8r615ded0c213f10c99a, Model Zoo:
https://github.com/IntelAI/models -b v1.6.1, Squad 1.1 dataset, oneDNN 1.4, BF16, BS=12, tested by Intel on 5/18/2020.
Baseline: 1 node, 4 x Intel Xeon Platinum 8280 processors with 768 GB total memory (24 x 32 GB, 2,933 GHz), 800 GB Intel SSD, ucode 0x4002f00, Intel HT Technology on, Intel Turbo Boost Technology on, with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, BERT-Large (QA). Throughput:
https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388d8f615ded0c213f10c99a, Model Zoo:
https://github.com/IntelAI/models -b v1.6.1, Squad 1.1 dataset, oneDNN 1.5,FP32, BS=12, tested by Intel on 5/18/2020.
4Up to 1.9x higher AI inference performance with a 3rd Generation Intel Xeon Scalable processor supporting Intel DL Boost with BF16 vs. a prior-generation processor with FP32 for BERT throughput for natural language processing.
New configuration: 1 node, 4 x 3rd Generation Intel Xeon Platinum 8380H processor (pre-production, 28 cores, 250 W) with 384 GB total memory (24 x 16 GB, 3,200 GHz), 800 GB Intel SSD, ucode 0x700001b, Intel HT Technology on, Intel Turbo Boost Technology on with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, BERT-Large (QA). Throughput:
https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388e8r615ded0c213f10c99a, Model Zoo:
https://github.com/IntelAI/models -b v1.6.1, Squad 1.1 dataset, oneDNN 1.4, BF16, BS=32, 4 instances, 28 cores/instance, tested by Intel on 5/18/2020.
Baseline: 1 node, 4 x Intel Xeon Platinum 8280 processors with 768 GB total memory (24 x 32 GB, 2,933 GHz), 800 GB Intel SSD, ucode 0x4002f00, Intel HT Technology on, Intel Turbo Boost Technology on, with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, BERTLarge (QA). Throughput:
https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388d8f615ded0c213f10c99a, Model Zoo:
https://github.com/IntelAI/models -b v1.6.1, Squad 1.1 dataset, oneDNN 1.5, FP32, BS=32, 4 instances, 28 cores/instance, tested by Intel on 5/18/2020.