Co-authored with: Zhewei Pang, Tao Zhang, Jiang Liu, Xiangmin Li of Tencent* Corporation
Introduction
Protecting privacy during data application is becoming more and more challenging, as the number of privacy regulations worldwide continues to grow.
Tencent Midas-TEEPot* Team has collaborated with Intel to build a trusted computing platform with BigDL PPML (Privacy Preserving Machine Learning) on Tencent Cloud ShuLianTong* products to address these challenges. By combining Intel® SGX (Software Guard Extensions) with several security technologies, BigDL PPML provides a trusted cluster environment for standard, distributed Big Data & AI applications, so as to run unmodified Big Data analysis and ML/DL programs in a secure fashion on a (private or public) cloud.
BigDL PPML
Figure 1. BigDL PPML Trusted Big Data & AI
BigDL is a Big Data AI project open-sourced by Intel. Latest BigDL 2.0 combines the original Analytics Zoo and BigDL projects, making it easy for data scientists and data engineers to develop distributed AI applications.
BigDL PPML combines various low-level hardware and software security technologies (e.g., Intel® Software Guard Extensions (Intel® SGX), Library Operating System (LibOS), Federated Learning, Attestation Service, Key management, etc.) to protect the end-to-end Big Data AI pipeline (from data ingestion, and data analysis, all the way to machine learning and deep learning) in Trusted single-node or Cluster Execution Environment. Customers can continue to run unmodified Big Data analysis and ML/DL programs (like Apache Spark, Apache Flink, Tensorflow, PyTorch, etc.) in a fully secured environment.
- Computation and memory protected by Intel® SGX Enclaves
- Network communication protected by remote attestation and Transport Layer Security (TLS)
- Storage (e.g., data and model) protected by encryption
- Optional Federated Learning support
Figure 2. BigDL PPML End-to-End Workflow
The process of BigDL PPML end-to-end workflow is as follows:
- User submits job to K8s (using BigDL PPML CLI), which creates the driver node
- Client attests the driver node
- Driver creates more worker nodes
- Driver attests worker nodes
- Driver and workers request keys from KMS
- Workers read and decrypt input data
- Workers run distributed Big Data, ML, and DL programs
- Workers encrypt and write output data
Tencent Midas-TEEPot Trusted Computing E2E Distributed Solution with BigDL PPML
Tencent Midas-TEEPot Team cooperated with the Intel BigDL team to build a Trusted E2E Bigdata Analytics Solution on Tencent Cloud ShuLianTong products. The related products have been used in China.
Figure 3. Tencent Midas-TEEPot Trusted Computing Platform Architecture
Tencent Midas-TEEPot Trusted Computing platform is a key function model of Tencent Cloud ShuLianTong products (as Figure3), BigDL PPML as its core model to support secured Spark SQL and ML function.
In this solution, the data source is converted and transported through the data source adapter, distributed to the data management module and trusted computing platform. At the same time, the Tencent Cloud Midas-TEEPot Trusted Computing Management Platform provides key management, remote attestation, and other operations to ensure confidentiality and authentication. The blockchain node is responsible for data catalog & authorization, scheduling, and the storage of final execution results. At present, Tencent Midas-TEEPot Trusted Computing Platform has been applied to the scenarios of multi-party data sharing by government departments, financial credit investigation, risk control, and other scenarios.
In Tencent Midas-TEEPot Trusted Computing Platform, developers only need to focus on the development of their business logic and rely on the Trusted Computing Platform to ensure the end-to-end security and privacy of their applications. Users can significantly improve efficiency and greatly shorten the develop life cycle for implementing trusted computing solutions.
Summary and Prospect
Tencent Midas-TEEPot team worked with Intel BigDL team to create this trusted computing solution, under the premise of ensuring data security and privacy protection, to achieve multi-party data sharing and collaborative computing that accelerates the development of privacy-preserving applications.
Tencent and Intel team will continue to work closely to further strengthen innovation and practice in end-to-end privacy protection, we will leverage Intel TDX (Trust Domain Extensions) that will be supported in the oncoming Intel next-generation Xeon platform. Intel TDX will greatly extend the supported applications & scope of confidential computing. We will explore other new security technology, e.g., Homomorphic encryption, differential privacy, and hybrid solution, etc to help users achieve more secure data integration and accelerate data value mining without compromising privacy.