Creating Speed and Security Leverage Points for Deep Learning with Baidu PaddlePaddle and 3rd Generation Intel® Xeon® Scalable Platforms

This article introduces how Baidu’s deep learning platform PaddlePaddle is supported by Intel® VNNI instructions and Software Guard Extension (SGX) technology on 3rd Generation Intel® Xeon® Scalable processors. As a result, PaddlePaddle can help developers deploy deep learning models faster and more easily through model quantization and acceleration; and create confidential computing capabilities to help more businesses utilize multi-source data in their deep learning models in a safe and reliable manner.

Deep learning technology is a crucial driving force for a new era powered by AI. By analyzing texts, images, sounds, and multimedia data to make predictions, deep learning is pushing more industries to become smarter. However, for many businesses, the sheer amount of data needed and complexity of modeling skills make deep learning unattainable. Developers are eager to find ways of facilitating the real-world application of deep learning.

For instance, most deep learning models are built using 32-bit floating-point precision (FP32) with high computational complexity and a large number of model parameters. These characteristics make it difficult to apply deep learning models in certain scenarios and devices, especially mobile and embedded devices. If we can create a speed leverage point by making these models “slimmer”, with reduced storage space and accelerated inference, then the application of deep learning becomes much more feasible. 
In addition, model performance depends on data quality. If businesses can break through data silos and aggregate high-quality data from multiple sources and adopt confidential computing to help ensure data security, then they will have a new security leverage point to realize untapped data potential.

Baidu’s open-source deep learning platform, PaddlePaddle, accelerated on 3rd Generation Intel® Xeon® Scalable processors, offers solutions that address the critical leverage points of speed and security. By creating more leverage points for deep learning, they can help industries apply deep learning capabilities to real-world challenges and opportunities.

Baidu PaddlePaddle

PaddlePaddle is based on Baidu's years of deep learning research and expertise in real-world business applications. It is China's first industry-level, open-source deep learning platform with advanced technology and comprehensive functions. It integrates deep learning core training and inference frameworks, a basic model library, end-to-end development kits, and a rich collection of toolkits. PaddlePaddle has benefited more than 2.65 million developers and 100,000 companies, and facilitated the development of 340,000 models.1 It can help developers quickly realize their ideas in AI and launch relevant applications, thereby empowering more companies to implement intelligent business upgrades with AI capabilities.

PaddlePaddle 2.0, launched in January 2021, makes the innovation and application of deep learning technology even easier through its newly upgraded API system. This new version delivers a more satisfying coding experience with a mature and complete dynamic graph mode. It improves ease of use and flexibility with more powerful distributed training capabilities. It has also established a robust hardware ecosystem, offering deep optimizations through the integration of software and hardware.

A New Speed Leverage Point for Applying PaddlePaddle Models: Quantization and Acceleration

PaddlePaddle addresses the speed leverage point through two steps: quantization of PaddlePaddle models with PaddleSlim, and deployment and acceleration on Intel® CPUs with Paddle Inference. Figure 1 provides an overview of this process.

Figure 1. Create Quantized Models and Deploy Them on Intel ® CPUs

Model Quantization

To help users make their models “slimmer” easily and quickly, Baidu PaddlePaddle launched PaddleSlim, a deep learning model compression tool. PaddleSlim contains a collection of compression strategies, such as pruning, fixed point quantization, knowledge distillation, hyperparameter searching, and neural architecture search (NAS). The latest PaddleSlim 2.0 release supports both dynamic and static graphs.

Quantization is the process of using lower bits to replace FP32, with the main focus on the representation in the INT8 data type. The advantages of model quantization include less storage space, faster inference, and lower energy consumption. PaddleSlim supports training-aware quantization and post-training static quantization, and covers computer vision (CV) and natural language processing (NLP) models. Both methods produce accurate scales and adopt a symmetric quantization approach.

Quantized Model Deployment and Acceleration with Intel® Technologies

When deploying, PaddlePaddle quantized leverages the built-in AI acceleration of 3rd Generation Intel® Xeon® Scalable processors, as well as Intel® oneAPI Toolkits.

VNNI Instructions

The 3rd Generation Intel® Xeon® Scalable processors feature Intel® Deep Learning Boost (Intel® DL Boost) technology, which includes new VNNI. Based on Intel® Advanced Vector Extensions 512 (Intel® AVX-512), VNNI improves inference performance by up to a factor of four over FP32 and reduces memory requirements by up to four times(see Figure 2). Reduced memory take-up and the higher frequency help accelerate low precision calculations. This translates to much faster AI and deep learning inference. Examples of targeted applications include image classification, speech recognition, language translation, object detection, and more.

Figure 2. The Intel® DL Boost AVX512_VNNI VPDPBUSD instruction enables 8-bit multiplies with 32-bit accumulates with 1 instruction u8×s8→s32. This provides a theoretical peak compute gain of 4x for INT8 OPS over FP32 OPS.

oneAPI

To activate the acceleration capabilities of VNNI instructions, PaddlePaddle uses Intel® oneAPI Toolkits as part of its model quantization solution. Intel® oneAPI is a unified and simplified programming model that integrates a JIT (just-in-time) code library of vector math operators across platforms. It lets developers on different architectures (CPU, GPU, and FPGA) easily use the oneAPI operator's instant universal interface, without having to worry about platform incompatibility issues.

Integration with PaddlePaddle and Quantization

With the support of VNNI instructions and oneAPI’s unified programming interface, the simulated INT8 models created through PaddleSlim can be transformed into real INT8 models. These models can then be deployed on 3rd Generation Intel® Xeon® Scalable processors. The core steps include:

  1. Collect and fine-tune the scales and other data obtained from the simulation model.
  2. Fuse model operators, such as conv + relu, and simplify the graph.
  3. Based on the list of supported oneDNN INT8 operators and using the scales obtained from the simulation training, insert the quantized/dequantized operators.
  4. Save the quantized INT8 model for further deployment and inference.
     

Application Examples

Currently, the Baidu PaddlePaddle model quantization and acceleration solution is widely applied across many Baidu services, such as Baidu OCR Services. Baidu's commercialized OCR offering covers high-precision text detection and recognition across various scenarios and languages. It is used for remote identity authentication, financial and tax reimbursement, document digitization and other areas as a way for businesses to reduce costs and increase efficiency. Baidu OCR provides stable and easy-to-use online APIs, offline SDK, software deployment packages and other services. In terms of model quantization and acceleration, Baidu has created a set of Slim OCR tools that allow for significant performance improvements. Because Baidu OCR models use image classification models (such as ResNet50) as the backbone, Figure 3 showcases the tested performance gains, citing ResNet50 as an example.

Figure 3. Single-Core Performance on Intel® Xeon® Platinum 8358 Before and After INT8 Quantization of an Image Classification Model

The test results show that on the Full ImageNet Val data set, the inference throughput of ResNet50 INT8 is 3.56 times that of FP32 (Figure 3). It can be concluded that the PaddlePaddle quantization solution integrated with 3rd Generation Intel® Xeon® Scalable processors can significantly improve the inference speed of multiple deep learning models. These performance increases can help developers significantly improve efficiency in their deep learning applications.

A New Security Leverage Point for Applying PaddlePaddle Models: Confidential Computing

Bringing Confidential Computing Capabilities to PaddlePaddle with Intel® SGX

Aggregating multi-source data for deep learning can present significant security challenges, particularly when sensitive data is involved. To address these challenges, Baidu PaddlePaddle is now integrated with MesaTEE, Baidu’s computing framework for confidential computing. MesaTEE, which stands for memory-safe trusted execution environments, leverages two key technologies: The Rust memory-safe language and Intel® Software Guard Extensions (Intel® SGX), a hardware-based security solution. Intel SGX bypasses a system’s operating system (OS) and virtual machine (VM) software layers. It partitions sensitive application scripts and data into hardened enclaves, giving them more protection from disclosure or modification. Using Intel® SGX, MesaTEE offers a robust approach to implementing confidential deep learning and protecting sensitive data. The commercial version MesaTEE builds on Apache Teaclave, a universal secure computing platform written in Rust and currently in incubation. With this version, Baidu offers customized commercial solutions to corporate users, and has created a first-of-its-kind collaborative confidential computing engine and realized distributed TEE cluster computing, helping  protect data in large-scale data models and training environments.

When integrated with PaddlePaddle, MesaTEE acts as a collaboration platform and task scheduler. It uses PaddlePaddle as a task execution environment through Executor plug-ins and delivers specific deep learning tasks to PaddlePaddle in TEEs. In addition, MesaTEE supports different TEE vendor products to adapt the validity of its Remote Attestation process and integrate it into the communication protocol. This approach helps MesaTEE ensure that the security of its operating environment is measurable. The solution supports remote file storage protocols such as S3 for file access and all file contents are encrypted. During the training process and without affecting the usage mode, all data accessed by PaddlePaddle in TEEs are in ciphertext. This capability is designed to help resist malicious attacks by preventing access to any computing content from outside the TEE. Figure 4 depicts the interactions between PaddlePaddle and MesaTEE.

Figure 4. The Interaction between PaddlePaddle and MesaTEE

Potential Application and Use Case

By combining deep learning with confidential computing, PaddlePaddle is flexibly responding to industry trends and positioning itself to continue playing a critical role in many industries. Take the finance industry as an example. For a long time, financial institutions relied on machine learning technology to model and analyze simple, two-dimensional numeric data for establishing borrower profiles. The rise of deep learning technology has opened up new possibilities. Deep learning models break through the limitations of traditional machine learning models by using voice, image and other methods to incorporate multimedia data and streaming data. Such data can include payment information (from third-party payment service providers), purchase history (from retailers and e-commerce platforms), credit data (from third-party credit institutions), and social contacts (from social media platforms). It can also include employment history (from recruitment platforms) travel information (from travel platforms) and other data types.

However, from the perspective of data providers, user privacy data is highly sensitive and may not be shared by law. As a result, data silos are commonplace in the financial industry. Data providers, model developers, model evaluators, and data users are all separated from each other. Through the PaddlePaddle deep learning platform running in TEEs, financial institutions like Baidu’s Du Xiaoman will be able to work with data providers in a more secure manner, to create a more robust credit risk assessment model while respecting privacy requirements.

Conclusion

Intel® Deep Learning Boost VNNI and Intel® Software Guard Extensions (SGX) technology are core features of the latest 3rd Generation Intel® Xeon® Scalable processors. They are also critical enabling technologies for more secure, high-performance AI applications. The 3rd Generation Intel Xeon Scalable processors provide a balanced architecture with built-in acceleration and advanced security capabilities, designed through decades of innovation for the most demanding workload requirements. Optimized for cloud, enterprise, AI, HPC, network, security, and Internet of Things (IoT) workloads, these processors come with 8 to 40 powerful cores and a wide range of frequency, feature, and power levels.

3rd Generation Intel® Xeon® Scalable processors are the only data center CPUs with built-in AI acceleration, supporting end-to-end data science tools, and an ecosystem of smart solutions. This powerful combination can help unlock valuable insights in apps from edge to cloud.

Baidu’s quantization solution, deployment tools, and further optimization and integration with Intel hardware offer compelling value as PaddlePaddle offerings increasingly go online on Intel platforms. Baidu’s rich pool of models and increasingly mature application development kits (such as PaddleOCR and PaddleDetection) add further value and convenience. Developers will experience the benefits of optimized model acceleration enabled by the “model + hardware” solution.

In addition, Baidu's deployment tools, such as Paddle Inference and Paddle Lite, will also be natively integrated with Intel inference Acceleration Libraries such as the Intel® Distribution of OpenVINO™ Toolkit. A range of continuous efforts will make it even easier for users of the PaddlePaddle deep learning framework to enjoy acceleration on Intel hardware. 

Learn More

The Intel and the PaddlePaddle team welcome more to GitHub to follow their projects and leave their feedback.

https://github.com/PaddlePaddle/Paddle

https://github.com/PaddlePaddle/Paddle-Lite


Notes

1 The data was provided by Baidu. Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.

2 Baidu: Platinum 8350C: Test by Intel as of 3/19/2021. 2-node, 2x Intel® Xeon® Platinum 8350C CPU @ 2.60GHz, 32 cores, HT On, Turbo Off. Total Memory 256 GB (4 slots/ 64GB/ 2933 MHz), BIOS: WLYDCRB1.SYS.0020.P92.2103170501 (ucode: 0x8d000270), Ubuntu 18.04.3 LTS, 4.15.0-55-generic, gcc 9.3.0 compiler, ResNet50 Model, Deep Learning Framework: PaddlePaddle2.0, Download at https://paddle-inference-lib.bj.bcebos.com/2.0.0-cpu-avx-mkl/paddle_inference.tgz BS=1, ImageNet Val, 1 instances, Datatype: FP32/INT8


Notices & Disclaimers

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex . 
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates.  See backup for configuration details.  No product or component can be absolutely secure.
Your costs and results may vary.
Intel technologies may require enabled hardware, software or service activation.
© Intel Corporation.  Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries.  Other names and brands may be claimed as the property of others.