Protect End-to-End Data Pipelines with BigDL Privacy-Preserving...

Protecting privacy and confidentiality is critical for large-scale data analysis and machine learning. BigDL PPML provides a trusted cluster environment for secure big data and artificial intelligence (AI) applications, even on untrusted cloud environments.

Based on Intel® Software Guard Extensions (Intel® SGX), Intel has built BigDL Privacy Preserving Machine Learning (PPPML) to secure the end-to-end big data and AI pipeline.

Intel BigDL PPML

BigDL, a unified open source artificial intelligence solution platform from Intel, aims to make it easier for data scientists and data engineers to build end-to-end, distributed AI applications. Using Intel® SGX, Intel’s Trusted Execution Environment (TEE) and integrating with other hardware and software security measures, BigDL has built a distributed PPML platform aimed at protecting end-to-end distributed AI pipelines from data ingestion, data analysis, all the way to machine learning and deep learning.

Figure 1. Intel BigDL PPML software stack
All graphics created by the authors

PPML protects data at rest, in transit, and in use: compute and memory are protected by SGX enclaves, storage (e.g. data and model) is protected by encryption, network communication is protected by both remote attestation and transport layer security (TLS) and optional federated learning support.

With BigDL PPML, users can run trusted big data and AI applications in a secure and trusted fashion, including trusted Spark* data analysis (such as Spark SQL*, DataFrame, MLlib*), trusted deep learning (such as BigDL, Orca*, Nano*, DLlib*), trusted federated learning): with private set intersection (PSI).

End-to-End Workflow

Figure 2. BigDL PPML based end-to-end secure computing workflow

Here’s a step-by-step breakdown of the end-to-end secure computing workflow:

User submits job to Kubernetes* (via BigDL PPML command line interface), which creates the driver node
BigDL PPML client attests the driver node
Driver creates more worker nodes
Driver attests worker nodes
Driver and workers request keys from KMS
Workers read and decrypt input data
Workers run distributed Big Data, ML and DL programs
Workers encrypt and write output data

Using the pre-configured workflow in Figure 2, developers can focus more on the development of business logic and use BigDL PPML to help ensure the end-to-end security and privacy of their applications. Users can significantly improve the development efficiency of private computing applications and shorten the time to develop private computing solutions.

Ths BigDL PPML solution has been deployed on Alibaba Cloud* DataTrust* platform, ByteDance* and others.

You can see how it works with this 10-minute demo presented at KubeCon* North America 2022 and check out the BigDL GitHub repo for more information.

Photo by Alina Grubnyak on Unsplash

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Protect End-to-End Data Pipelines with BigDL Privacy-Preserving Machine Learning (PPML)

Intel BigDL PPML

End-to-End Workflow

Product and Performance Information