Secured AI Model Inferencing at the Edge with Intel® Developer Cloud for Edge Workloads

ID 750971
Updated 10/18/2022
Version Latest
Public

author-image

By

Introduction

Intel® Developer Cloud for Edge Workloads is a sandbox for prototyping and experimenting with workloads on Intel® architecture, such as CPUs (central processing units), GPUs (graphic processing units), and VPUs (vision processing units). Here we illustrate how you can bring your machine learning model and explore the OpenVINO™ toolkit to run it in a confidential computing environment with license checking support.

Motivation

An AI model consists of a network or graph with a multitude of parameters per node, trained often with a huge amount of data and computing resources. A production model is typically a high-value IP (intellectual property), which the owner wants to protect from being stolen or reverse engineered. This is even more of an issue at the edge.

Threat model analysis

For customer’s high-valued AI models, there are three aspects of security risks:
•    Data in transit
•    Data at rest
•    Data in use

Mature technologies such as TLS (transport layer security) and encrypted storage can be used to protect all data in transit and at rest respectively. We shall focus on protecting data while in use, in memory.
For data in use, if the model itself is not encrypted in advance, when it is loaded to the model serving framework, such as Pytorch, TensorFlow, OpenVINO™ model server, etc., malicious software can attack. For an encrypted AI model, it must be decrypted before running inference. When running inference, the model itself and the inference-generated intermediate data and output files are either in the host’s memory or file system; malicious software also could attack through technologies such as memory snooping. Further, to reduce the attack surface and protect from privileged processes such as the operating system or even a cloud service provider, ideally one should run inference in a trusted execution environment (TEE).
A TEE is a tamper-resistant processing environment. It guarantees the confidentiality and integrity of the executed code, data, and runtime state (e.g., CPU registers, memory, and sensitive I/O). In addition, it provides remote attestation that proves its authenticity and trustworthiness for third parties. The content of a TEE is not static; it can be securely updated. A TEE resists all software attacks as well as the physical attacks performed on the main memory of the system. Attacks performed by exploiting backdoor security flaws are not possible.

Secure AI model in use

This article focuses on protecting “data in use”, in particular the AI models. The infrastructure provider, Intel Developer Cloud for Edge Workloads provider, is outside of the trusted computing base (TCB) per our threat model. Using a TEE, we can guarantee protection:

•    Model retrieval
•    Model decryption
•    Model inferencing

End-to-end security solution

To protect models in use on untrusted infrastructure or on generic Infrastructure as a Service (IaaS) platforms, we should use an HW-based TEE. It provides workload confidentiality and integrity, protecting it from other workloads running on the system, including the host operating system and any hypervisor. We can download the encrypted model, decrypt it, and run inference in a TEE, such as an Intel® Software Guard Extensions (Intel® SGX) enclave.
To use Intel SGX without modifying the application, we could leverage a lightweight library OS designed to run a single application, such as open source Gramine. Gramine, with its minimal set of system calls, brings most of the benefits of a full operating system without significantly increasing the application trusted compute base.
Docker containers are widely used to deploy applications in the cloud. Using Gramine shielded containers (GSC) we provide the infrastructure to deploy Docker containers protected by Intel SGX enclaves using the Gramine Library OS.
The GSC tool transforms a Docker image into a new image which includes the Gramine Library OS, manifest files, Intel SGX related information. The new image when launched executes the application inside an Intel SGX enclave using the Gramine Library OS. It follows the common Docker approach of first building an image and subsequently running this image inside a container. To build the graminized image, use the “gsc build” command. Follow this by signing the image using the “gsc sign-image” command. Subsequently, the image can be run using “docker run”.

OpenVINO and OpenVINO Security Add-on

The OpenVINO model server is a high-performance system for serving machine learning models. It is written in C++ for high scalability and optimized for Intel® architecture. This allows you to take advantage of all the power of the Intel® Xeon® processors and their AI acceleration and expose it over a network interface. The OpenVINO model server uses the same architecture and API as TensorFlow Serving, while applying OpenVINO for inference execution. The Inference service is provided via gRPC or REST API, making it easy to deploy new algorithms and AI experiments.
Model repositories may reside on a locally accessible file system (e.g., NFS), or on online storage compatible with Google Cloud Storage (GCS), Amazon S3, or Azure Blob Storage.
The OpenVINO Security Add-on helps control access to the model inference service via license checks in addition to hosting it in a TEE.

Figure 1 OpenVINO™ Security Add-on components and architecture

 

Figure 1 shows where the OpenVINO Security Add-on fits into model development and deployment. There are three personas/roles in this solution:

  • Model developer: The entity that created the model and will encrypt it and sign it.
  • Model controller: The entity that will control access to a model. The Model controller could also be the model developer, or an independent software vendor
  • Model customer/user: The entity that is building an application that leverages the model.

The OpenVINO Security Add-on consists of three components:

1.    OpenVINO Security Add-on tool

The model developer/owner uses the OpenVINO Security Add-on tool in a TEE to generate an access-controlled model and master license.

  • The access-controlled model uses the model's intermediate representation (IR) files to create an access-controlled output file archive that is distributed to model users. The developer can also put the archive file in long-term storage or back it up without additional security.
  • The model developer uses the OpenVINO Security Add-on tool to generate and manage cryptographic keys and related collateral for the access-controlled models. Cryptographic material is only made available inside a TEE. The OpenVINO Security Add-on key management system lets the model developer get external Certificate Authorities to generate certificates to add to a key-store.
  • The model developer generates user-specific licenses in a JSON format file for the access-controlled model. The model developer can define global or user-specific licenses and attach licensing policies to the licenses. For example, the model developer can add a license expiration time and/or rate limit the use of a model in each period.

2.    OpenVINO Security Add-on runtime

Model Users install and use the OpenVINO Security Add-on runtime inside a TEE. The TEE provides confidentiality and integrity to the workload, to safeguard model IP.

Externally from the OpenVINO Security Add-on, the user adds the access-controlled model to the OpenVINO model server startup configuration file. The OpenVINO model server attempts to load the model in memory. At this time, the OpenVINO Security Add-on runtime component reaches out to the license service to validate the user’s license. After the license is successfully validated, the OpenVINO model server loads the model and services inference requests.

3.    OpenVINO Security Add-on license service

Use the OpenVINO Security Add-on license service to verify model user’s license.

  • The model owner or controller or independent software vendor hosts the OpenVINO Security Add-on license service, which responds to license validation requests when a user attempts to load an access-controlled model in a model server. The licenses are registered with the license service.
  • When a model user loads the model or seeks to serve the model, the OpenVINO Security Add-on runtime contacts the license service to make sure the license is valid and within usage limits. The model user must be able to reach the designated license service over the internet.
Figure 2 OpenVINO™ Security Add-on roles interaction and workflow

 

Figure 2 describes the interactions between the model developer, independent software vendor, and user. The interaction workflow follows five phases:

Phase 1: Model developer - publish model

Step 1: Setup the keystore and artifacts directory
Step 2: Create a key store and add a certificate to it
Step 3: Download the model
Step 4: Define access control for the model and create a master license for it
Step 5: Create a runtime reference TCB (trusted computing base)
Step 6: Publish the access-controlled model and runtime reference TCB

Phase 2: Model controller – set up license service

Phase 3: Model customer/user - request access-controlled model

Step 1: Create a key store and add a certificate to it
Step 2: Request an access-controlled model from the model developer

Phase 4: Model developer - provision model

Step 1: Receive a customer model purchase/access request
Step 2: Create customer license configuration
Step 3: Create the customer license
Step 4: Update the license server database with the license.
Step 5: Share the access-controlled model with the model user/customer

Phase 5 Model user/customer - load the access-controlled model to OpenVINO model server

Step 1: Load the access-controlled model into the OpenVINO model server
Step 2: Start the NGINX model server
Step 3: Develop AI/ML inference applications
Step 4: Run inference

Intel Developer Cloud for Edge Workloads practice

Secured Model Inferencing

The secured AI model inferencing option in Intel Developer Cloud for Edge Workloads is a JupyterLab embedded proof of concept (PoC). The choice of Jupyter Notebook reflects its popularity due to ease of use by AI/ML scientists and actioners. Figure 4 shows the workflow of Jupyter Notebook-based AI inferencing. The edge compute nodes are bare metal Intel hardware platforms. There is a high-performance resource and queue management framework called TORQUE deployed on the development servers. Users develop their AI applications in Jupyter Notebook and use scripts to submit inference request jobs to TORQUE. TORQUE then queues those jobs and schedules them to available edge compute nodes to run AI inference. The inference results are sent back to developer servers and are visible in the  Jupyter Notebook; there, users can check results, do benchmark comparison, and performance tuning. The Jupyter Notebook environment serves act as good vehicle for learning and tuning AI model inferencing.

Figure 3 JupyterLab architecture and workflow

 

In Section 3, we introduced the OpenVINO Security Add-on solution for end-to-end AI model protection. Since Intel Developer Cloud for Edge Workloads uses the OpenVINO toolkit to run AI model inferencing, it was natural to adopt the OpenVINO Security Add-on for securing the model.
OpenVINO Security Add-on has a dependency on Intel SGX and Gramine Shielded Containers (GSC) which are Docker based. We need to first build and install Intel SGX software packages, Gramine packages, and the Intel® Software Guard Extensions Data Center Attestation Primitives (Intel® SGX DCAP) for Linux, which provides software modules to aid Intel applications in performing attestation. OpenVINO Security Add-on itself has three major components: OpenVINO Security Add-on tool, OpenVINO Security Add-on runtime, and OpenVINO Security Add-on license service. The OpenVINO Security Add-on packages, binaries, and GSC docker image should be deployed on the target edge compute nodes. Using Intel Developer Cloud for Edge Workloads pre-empts the need for any setup and allows you to rapidly familiarize yourself with the technology and become productive.
After the edge compute node setup jobs finish, the Jupyter Notebook-based workflow steps are as follows:

  1. Create a bash script job file in Jupyter Notebook.
  2. Run the TORQUE-specific command line tool (qsub) to submit the job.
  3. TORQUE schedules the job to target edge compute node and the job’s bash script is executed on the node.
  4. The job script includes snippets which cover all those phases of operations described in Figure 2. A GSC container, which bundles NGINX/OpenVINO model server/OpenVINO Security Add-on/Gramine together, will be launched and starts to serve model inference requests.
  5. The bash script execution results will be sent back to the Jupyter Notebook as files with name “<model_name>_o<job_id>.txt” (output logs) and “<model_name>_e<job_id>.txt” (error logs).
  6. Jupyter Notebook python script snippets check the results and do live demonstration.

Since the model encryption, decryption, and inference all happen in an Intel SGX enclave, the whole AI model inference life cycle is secured.
Our sample tutorials walk you through each of the steps of model upload, importing data, and developing your AI/ML inference application and comparing performance. You can experiment with your own models and custom processing. The scripts clearly document where and what to edit to leverage your custom models.

Summary

This article introduced Intel Developer Cloud for Edge Workloads and its typical use case scenarios, especially for the Data-In-Use protection of high-value AI model inferencing.
We introduced OpenVINO Security Add-on and how it can be used in Intel DevCloud for secure AI model inferencing. Visit Intel® Developer Cloud to get a free account to start exploring Intel Developer Cloud for Edge Workloads and developing AI/ML applications with secure model inferencing.
Using TEEs for containerized workloads will get simpler. Do watch the open-source confidential containers project which is underway to ease deployment of unmodified containerized workloads in TEEs in Kubernetes environments.

Reference

Use these links for more information:

•    Intel® DevCloud: Edge Workloads
•    Trusted Execution Environment: What It is, and What It is Not | IEEE Conference Publication | IEEE Xplore
•    Intel® Software Guard Extensions (Intel® SGX)
•    Intel® Software Guard Extensions Data Center Attestation Primitives (Intel® SGX DCAP)
•    Gramine SGX
•    OpenVINO™ Toolkit
•    OpenVINO™ Model Server Quick Start Guide
•    OpenVINO™ Security Add-on for Gramine-SGX
•    Confidential Containers