Alibaba Cloud Accelerates AI Applications

Analytics Zoo and bfloat16 improve the performance of AI applications on seventh-generation Alibaba Cloud ECS instances.

At a Glance:

Seventh-generation Alibaba Cloud high-frequency ECS instances use the third generation of X-Dragon Architecture and 3rd Generation Intel® Xeon® Scalable processors.
3rd Generation Intel Xeon Scalable processors deliver industry-leading and workload-optimized platforms by using enhanced Intel® Deep Learning Boost (Intel® DL Boost), which is a built-in artificial intelligence (AI) acceleration feature. Enhanced Intel DL Boost provides the first x86 support for bfloat16 in the industry, which enhances AI inference and training performance.

Download the one-page summary

PDF

Executive Overview

This paper describes how to use Analytics Zoo and Brain Floating Point 16-bit (bfloat16) to improve the performance of artificial intelligence (AI) applications running on seventh-generation Alibaba Cloud Elastic Compute Service (ECS) instances.

Seventh-generation Alibaba Cloud ECS instances are powered by 3rd Generation Intel® Xeon® Scalable processors, and they provide bfloat16 support.

3rd Generation Intel Xeon Scalable processors can process complex AI workloads. By using enhanced Intel DL Boost, 3rd Generation Intel Xeon Scalable processors can deliver up to 1.93 times the AI training performance,¹ up to 1.87 times the AI inference performance for image classification,² up to 1.7 times the AI training performance for natural language processing (NLP),³ and up to 1.9 times the AI inference performance for NLP, compared with previous-generation processors.⁴ Many AI-training workloads from industry sectors such as healthcare, financial, and retail can benefit from the bfloat16 support provided by these processors.

Read the white paper - Accelerating AI Applications on Alibaba Cloud with Analytics Zoo and Bfloat16.

Explore Related Products and Solutions

Drive actionable insight, count on hardware-based security, and deploy dynamic service delivery with Intel® Xeon® Scalable processors.

Learn more

Seamlessly scale your AI models to big data clusters with thousands of nodes for distributed training or inference.

Learn more

Intel® Xeon® Scalable processors take embedded AI performance to the next level with Intel® Deep Learning Boost.

Learn more

Show more Show less

Customer Stories and Case Studies

Explore the latest customer stories, case studies, and news releases highlighting data-centric innovations.

Data Center Workloads

Learn how Intel® technologies can help provide the scalability needed for high-demand workloads and applications.

Data Center Insights

Get the latest information about Intel data center performance, flexibility, and scalability.

Product and Performance Information

¹Up to 1.93x higher AI training performance with a 3rd Generation Intel Xeon Scalable processor supporting Intel DL Boost with BF16 vs. a prior-generation processor with ResNet-50 throughput for image classification. New configuration: 1 node, 4 x 3rd Generation Intel Xeon Platinum 8380H processor (pre-production 28 cores, 250 W) with 384 GB total memory (24 x 16 GB, 3,200 GHz), 800 GB Intel SSD drive, ResNet-50 v1.5, ucode 0x700001b, Intel Hyper-Threading Technology (Intel HT Technology) on, Intel Turbo Boost Technology on, and running Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic. Throughput: https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642769358b388d8f615ded9c213f10c99a, Model Zoo: https://github.com/IntelAI/models -b v1.6.1, ImageNet dataset, oneDNN 1.4, BF16, BS=512, tested by Intel on 5/18/2020. Baseline: 1 node, 4 x Intel Xeon Platinum 8280 processor with 768 GB total memory (24 x 32 GB, 2,933 GHz), 800 GB Intel SSD, ucode 0x4002f00, Intel HT Technology on, Intel Turbo Boost Technology on, with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, ResNet-50 v1.5. Throughput: https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388d8f615ded0c213f10c99a, Model Zoo: https://github.com/intelai/models -b v1.6.1, ImageNet dataset, oneDNN 1.4, FP32, BS=512, tested by Intel on 5/18/2020.

²Up to 1.87x higher AI inference performance with a 3rd Generation Intel Xeon Scalable processors supporting Intel DL Boost with BF16 compared to prior-generation processors using FP32 on ResNet-50 throughput for image classification. New configuration: 1 node, 4 x 3rd Generation Intel Xeon Platinum 8380H processor (pre-production, 28 cores, 250 W) with 384 GB total memory (24 x 16 GB, 3,200 GHz), 800 GB Intel SSD, ucode 0x700001b, Intel HT Technology on, Intel Turbo Boost Technology on with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, ResNet-50 v1.5. Throughput: https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388e8r615ded0c213f10c99a, Model Zoo: https://github.com/IntelAI/models -b v1.6.1, ImageNet dataset, oneDNN 1.4, BF16, BS=56, 5 instances, 28 cores/instance, tested by Intel on 5/18/2020. Baseline: 1 node, 4 x Intel Xeon Platinum 8280 processors with 768 GB total memory (24 x 32 GB, 2,933 GHz), 800 GB Intel SSD, ucode 0x4002f00, Intel HT Technology on, Intel Turbo Boost Technology on, with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, ResNet-50 v1.5. Throughput: https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388d8f615ded0c213f10c99a, Model Zoo: https://github.com/IntelAI/models -b v1.6.1, ImageNet dataset, oneDNN 1.5, FP32, BS=56, 4 instances, 28 cores/instance, tested by Intel on 5/18/2020.

³Up to 1.7x more AI training performance with a 3rd Generation Intel Xeon Scalable processor supporting Intel DL Boost with BF16 vs. a prior-generation processor on BERT throughput for natural language processing. New configuration: 1 node, 4 x 3rd Generation Intel Xeon Platinum 8380H processor (pre-production, 28 cores, 250 W) with 384 GB total memory (24 x 16 GB, 3,200 GHz), 800 GB Intel SSD, ucode 0x700001b, Intel HT Technology on, Intel Turbo Boost Technology on with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, BERT-Large (QA). Throughput: https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388e8r615ded0c213f10c99a, Model Zoo: https://github.com/IntelAI/models -b v1.6.1, Squad 1.1 dataset, oneDNN 1.4, BF16, BS=12, tested by Intel on 5/18/2020. Baseline: 1 node, 4 x Intel Xeon Platinum 8280 processors with 768 GB total memory (24 x 32 GB, 2,933 GHz), 800 GB Intel SSD, ucode 0x4002f00, Intel HT Technology on, Intel Turbo Boost Technology on, with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, BERT-Large (QA). Throughput: https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388d8f615ded0c213f10c99a, Model Zoo: https://github.com/IntelAI/models -b v1.6.1, Squad 1.1 dataset, oneDNN 1.5,FP32, BS=12, tested by Intel on 5/18/2020.

⁴Up to 1.9x higher AI inference performance with a 3rd Generation Intel Xeon Scalable processor supporting Intel DL Boost with BF16 vs. a prior-generation processor with FP32 for BERT throughput for natural language processing. New configuration: 1 node, 4 x 3rd Generation Intel Xeon Platinum 8380H processor (pre-production, 28 cores, 250 W) with 384 GB total memory (24 x 16 GB, 3,200 GHz), 800 GB Intel SSD, ucode 0x700001b, Intel HT Technology on, Intel Turbo Boost Technology on with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, BERT-Large (QA). Throughput: https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388e8r615ded0c213f10c99a, Model Zoo: https://github.com/IntelAI/models -b v1.6.1, Squad 1.1 dataset, oneDNN 1.4, BF16, BS=32, 4 instances, 28 cores/instance, tested by Intel on 5/18/2020. Baseline: 1 node, 4 x Intel Xeon Platinum 8280 processors with 768 GB total memory (24 x 32 GB, 2,933 GHz), 800 GB Intel SSD, ucode 0x4002f00, Intel HT Technology on, Intel Turbo Boost Technology on, with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, BERTLarge (QA). Throughput: https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388d8f615ded0c213f10c99a, Model Zoo: https://github.com/IntelAI/models -b v1.6.1, Squad 1.1 dataset, oneDNN 1.5, FP32, BS=32, 4 instances, 28 cores/instance, tested by Intel on 5/18/2020.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Alibaba Cloud Accelerates AI Applications

At a Glance:

Executive Overview

Explore Related Products and Solutions

Customer Stories and Case Studies

Data Center Workloads

Data Center Insights

Product and Performance Information

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Alibaba Cloud Accelerates AI Applications

At a Glance:

Executive Overview

Explore Related Products and Solutions

Intel® Xeon® Scalable Processors

Analytics Zoo

Intel® Deep Learning Boost

Customer Stories and Case Studies

Data Center Workloads

Data Center Insights

Product and Performance Information