Alibaba Group Speeds Transformer Model Performance

Alibaba and Intel accelerate Transformer model performance with 3rd Gen Intel® Xeon® Scalable processors and Intel® DL Boost

At a glance:

  • Alibaba Group provides the technology infrastructure and marketing reach to help merchants, brands, retailers, and other businesses to leverage the power of new technology to engage with their users and customers and operate in a more efficient way.

  • Alibaba and Intel worked together to explore 3rd Gen Intel® Xeon® Scalable processor capabilities for AI applications, particularly when used with Intel® Deep Learning Boost. They also explored the Intel® Neural Compressor, which helps customers rapidly develop and deploy their AI INT8 models on Intel® Xeon® Scalable processor-based platforms.

author-image

By

Executive Summary

3rd Gen Intel® Xeon® Scalable Processors are based on Intel’s 10nm+ process technology. They offer more CPU cores, higher memory capacity, and frequency than previous generation. Technologists from Alibaba Group and Intel worked together to explore what these capabilities mean for AI applications, particularly when used with Intel® Deep Learning Boost (Intel® DL Boost). We also explored the Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), which helps customers rapidly develop and deploy their AI INT8 models on Intel Xeon Scalable processor-based platforms. We optimized the Alibaba Transformer model on 3rd Gen Intel Xeon Scalable Processors and demonstrated 1.36x and 1.42x1  performance improvement in FP32 and INT8 inference over Intel’s previous generation processors.

Transformer is a key model used in the Alibaba Machine Learning Platform for AI (PAI). It is widely used in real-world natural language processing (NLP) tasks, serving millions of users through Alibaba’s online service. Low latency and high throughput are keys to 
Transformer’s success, and 8-bit low precision is a promising technique to meet such requirements.

Intel DL Boost offers powerful capabilities for 8-bit low precision inference on AI workloads. With the support of Intel Neural Compressor, we can optimize 8-bit inference performance while significantly reducing accuracy loss. These capabilities demonstrate leadership in AI inference and show the power of Intel DL Boost and 3rd Gen Intel Xeon Scalable Processors.

Product and Performance Information

11. Configuration Details Alibaba PAI NLP Transformer Model on PyTorch 1.7.1 Throughput Performance on 3rd Gen Intel® Xeon® Scalable Processors Family. Baseline Configuration: Test by Intel as of 03/19/2021. 2-node, 2x Intel® Xeon® Platinum 8269C Processor, 26 cores, HT On, Turbo ON, Total Memory 192 GB (12 slots/16 GB/2933 MHz), BIOS: SE5C620.86B.02.01.0013.121520200651(0x4003003), CentOS 8.3, 4.18.0-240.1.1.el8_3.x86_64, gcc 8.3.1 compiler, Transformer Model, Deep Learning Framework: PyTorch 1.7.1, https://download.pytorch.org/whl/cpu/torch-1.7.1%2Bcpu-cp36-cp36m-linux_x86_64.whl, BS=1, Customer Data, 26 instances/2 sockets, Datatype: FP32/INT8. New Configuration: Test by Intel as of 03/19/2021. 2-node, 2x Intel® Xeon® Platinum 8369B Processor, 32 cores, HT On, Turbo ON, Total Memory 512 GB (16 slots/32GB/3200 MHz), BIOS: WLYDCRB1.SYS.0020.P92.2103170501 (0xd000260), CentOS 8.3, 4.18.0-240.1.1.el8_3.x86_64, gcc 8.3.1 compiler, Transformer Model, Deep Learning Framework: PyTorch 1.7.1, https://download.pytorch.org/whl/cpu/torch-1.7.1%2Bcpu-cp36-cp36m-linux_x86_64.whl, BS=1, Customer Data, 32 instances/2 sockets, Datatype: FP32/INT8. All performance data is tested in lab environment.