# How to Share Your Crypto Resource with the Data Plane Development Kit Virtio Crypto Poll Mode Driver

Published: 01/28/2020

Last Updated: 01/28/2020

## Encryption Technology and Virtio Crypto

The rapid expansion of the internet in recent years, and its access through mobile devices, has increased user demand for data security. Many network information flows are now encrypted to prevent network attacks such as leaks, phishing, and replay. Google has implemented full-site HTTPS encryption and reported in July 2018 that over 70 percent of the Chrome* data had been encrypted.1

Encryption and security, including Internet Protocol Security (IPsec), demands extensive computing resources. Even basic AES block encryption requires many rounds of table lookups, shifts, linear transformations, and other operations. Today's dramatic rise in network flow throughput has caused an order of magnitude increase in encryption work density. This has stimulated the development and evolution of encryption accelerator technology. For example, the Intel® C620 series chipset includes integrated Intel® QuickAssist Technology (Intel® QAT) to achieve an IPSec data plane processing capability of over 130 Gbps.

Further, as cloud computing adoption continues to grow, cloud users' demand for network security also increases. Many cloud users benefit from encryption acceleration technology through standard encryption accelerators on the market today. This includes Intel QAT, which uses single-root input/output (SR-IOV) virtual function (VF) technology to enable multiple virtual machines (VMs) and containers to share an Intel QAT resource with minimal performance loss, as shown in Figure 1.

Figure 1. Single-root input/output of Intel® QAT used in virtual machines

SR-IOV technology has a few disadvantages:

• The VM needs the corresponding VF driver, which is strongly coupled with the physical hardware.
• The set of VF technologies used by a PCIe physical function (PF) is limited, and its maximum number limits the number of VMs or containers that can share the PF.
• The maximum throughput of a PF is the maximum throughput of a VF; therefore, users cannot obtain higher performance.
• A VF has neither the rate limit function nor Quality of Service (QoS) support. The maximum throughput of a PF is limited; increasing the throughput of one VF will restrict other VFs, and this restriction prevents the VF technology of the encryption accelerator from being widely used in public clouds.

Because of these disadvantages, the encryption accelerator’s virtualization technology should be independent of real machine hardware, and it needs to be expanded. At the same time, the back end (physical machine) can aggregate multiple accelerator resources to complete the rate limit work and QoS. Virtio crypto provides a solution to the problem.

## What is Virtio Crypto?

Virtio-crypto is a virtual device that is supported by the VIRTIO (virtual I/O device) standard and consists of a front-end driver and back-end device. The front end is generally driven by virtio-crypto devices accessible to VMs and containers. The back end is a Vhost crypto device simulated by programs, including QEMU, on physical servers. If a VM or container application performs encryption or authentication operations by accessing a virtio-crypto device, the virtio-crypto driver at the front end will transfer the task to the back end and send it to the crypto resource of the physical server for processing. The virtio-crypto driver in the front end is integrated into the Linux* kernel, while the software virtio-crypto device of QEMU is implemented in the back end.

This scheme has two drawbacks:

• The front-end Linux Kernel Crypto Framework (LKCF) can slow down performance. Because the front-end virtio-crypto driver is located in the LKCF of the kernel, each encryption requires two copies of data and several callback calls, and it does not support burst operations. These issues have been shown to slow down front-end performance significantly.
• Virtio-crypto devices on the back end of QEMU do not support hardware acceleration.

The current Virtio crypto scheme is inefficient. To solve this problem, Intel proposes a complete scheme based on the Data Plane Development Kit (DPDK) for the front and back ends of Virtio crypto.

## Back-End Solution for DPDK-Based Vhost User Crypto

The crypto proxy of vhost-user began supporting QEMU in early 2018 so that other client vhost-user applications could accelerate crypto computing. In May 2018, the DPDK-based vhost-user crypto was launched, becoming the first vhost-user acceleration scheme to support crypto devices.

Vhost-user crypto, an extension of the DPDK Vhost library, provides several new APIs and attempts to hide many details of the Vhost and Cryptodev implementation. The following example illustrates how they work.

First, you initialize the host's Cryptodev resources, including devices and queues, to create a mempool session. The host should use a look-aside device with strong encryption performance, such as Intel QAT. If you use cryptodev software, such as AESNI-MB or OPENSSL, the overall performance will be weaker than if using software directly to encrypt the engine on the VM due to the cost of virtio data transmission expenses. For instructions on how to initialize host Cryptodev resources, see the Cryptodev Programmer's Guide.)

Second, as when configuring other Vhost applications, you create a vhost_device_ops instance to register the callback functions of the socket connection at the time of creation and destruction, which are new_device and destroy_device, respectively.

In the new_device () function, you need to do the following:

static int
new_device(int vid) {
... 

1. Call the rte_vhost_crypto_create () function to initialize the Vhost crypto instance. vid is the device number of the virtio device, cryptodev_id is the device number of the initialized DPDK Cryptodev, and crypto_sess_pool and crypto_sess_priv_pool are sessions for creating and initializing VM encryption and decryption tasks.

In standard Virtio crypto, the information exchanged for a session is session_id. In DPDK Cryptodev, the session is a specific piece of data. Therefore, each instance of Vhost crypto will maintain a table to map the relationship of data between the virtio-crypto session_id and the DPDK Cryptodev session, and it will automatically process the requests sent by VM to create and delete sessions. You do not need to intervene; simply provide the Cryptodev ID and sessionmempool when creating the instance.

rte_vhost_crypto_create(vid,
cryptodev_id,
crypto_sess_pool,
crypto_sess_priv_pool,
rte_lcore_to_socket_id(lcore_id);
...


2. Tell vhost-user crypto to turn on zero copy to improve the processing efficiency. This allows the function to avoid copying and writing back the data when processing a virtio-crypto data request. The function will only carry out the VM » host conversion of the requests’ physical address.

Note: This method has certain limitations. The host system is required to turn on the I/O memory management unit (IOMMU), virtual IOMMU, etc. Also, the AESNI multibuffer crypto poll-mode driver (MB PMD) is not supported on the host side, and physical discontinuity of the data memory is not supported. In the VM, LKCF conducts a feasibility test after initializing each device, including the accuracy test of a scatter/gather list. When turned on, zero copy causes the VM’s device driver to fail this test, resulting in initialization failure. Therefore, the host side must wait for LKCF test completion before turning on zero copy.

rte_vhost_crypto_set_zero_copy(vid,zero_copy);
...
} 

The data plane processing applied by vhost-user crypto is also very simple. The following code snippet is executed by worker lcore.

static int
vhost_crypto_worker(void *arg) {
uint16_tinflight = 0;
intcallfdsMAX_NB_VIRTIO_CRYPTO_DEVICES;
structrte_crypto_op *opsburst_size;
...
/*  first allocate a burst's DPDK Crypto Operation */
rte_crypto_op_bulk_alloc(cop_pool,RTE_CRYPTO_OP_TYPE_SYMMETRIC, ops, burst_size);

/*  infinite loop to actually handle VM Crypto Data Request */
for(;  ;  ) {
uint16_t fetched;

/*  Get the operation number that can be completely enqueued
to Cryptodev, see below nb_ops=RTE_MIN(burst_size,
NB_DESCRIPTORS – inflight */

/*  obtain the maximum nb_ops virtio crypto request from virtio queue,
convert it into DPDK Crypto Operation, write it into ops, and
finally return the processed number of ops. */
fetched = rte_vhost_crypto_fetch_requests(vid,virtio_queue_id, ops,nb_ops);

/*  enqueue the converted ops to Cryptodev's queue.  */
inflight+= rte_cryptodev_enqueue_burst(cryptodev_id, queue_id, ops,fetched);

/*  One tip: The queue of DPDK Cryptodev is not infinite. If the queue
is saturated, the above operations will not be successfully enqueued
to the fetched operations. However, the operation of the
unsuccessful enqueue must be properly saved to avoid the loss of the
VM's crypto request. Therefore, it is recommended that the user
specify NB_DESCRIPTORS (i.e., the size of the queue) when recording
the initial Cryptodev queue. When rte_vhost_crypto_fetch_requests（)
is called, it is compared with inflight to obtain the appropriate
nb_ops size to avoid the situation that the fetched are too many to
be enqueued. Recording inflight can also be used for more complex
SLA related operations, such as QoS */

/*  Dequeue the Crypto Operation processed by DPDK Cryptodev */
fetched= rte_cryptodev_dequeue_burst(cryptodev_id, queue_id, ops_dequeue, RTE_MIN(burst_size, inflight));

/*  Update inflight */
inflight-= fetched;
/*  Because of the asynchrony of Cryptodev operation, dequeued operation
may not be the one enqueued just now, nor may it belong to the same
Virtio Device. Therefore, the following functions should be called
to write back the processed buffer (zero copy is not turned on).
All Virtio Device ID of all requests of this burst should be
recorded in callfds. */
rte_vhost_crypto_finalize_requests(ops_dequeue, fetched,callfds, &nb_callfds);

/* Finally, notify every Virtio Device that encryption and decryption
are completed */
for(i = 0;  i < nb_callfds;  i++)
eventfd_write(callfdsi,(eventfd_t)1);
}


The vhost-user crypto API is easy to use, decouples front and back ends, has high extensibility, and more. ZERO COPY can greatly improve performance and can be easily integrated into the application of complex virtual crypto resource services. However, due to the limitations of the Virtio crypto Linux driver and QEMU proxy, only AES-CBC and SHA1-MAC algorithms are currently supported. The increase of supported algorithms in the future will fuel further development of vhost-user crypto.

## DPDK-Based Virtio-User Crypto Poll Mode Driver (PMD)

To solve the slow-performance problem of LKCF, we added the virtio-crypto PMD based on Virtio User to the Cryptodev Framework of DPDK. The PMD shares the same control plane and data plane API with other DPDK crypto PMDs, except that it works in a VM or container.

To use the PMD, you start the Vhost crypto application, create a UNIX socket file, and pass its QEMU to the VM. An example of the QEMU command is as follows:

qemu-system-x86_64 \
...\
-chardevsocket,id=charcrypto0,path=/path/to/your/socket \
-objectcryptodev-vhost-user,id=cryptodev0,chardev=charcrypto0 \
-devicevirtio-crypto-pci,id=crypto0,cryptodev=cryptodev0
... 

In VM, the Linux kernel will bind the device to a virtio-crypto device by default. You need to bind the virtio device transferred from QEMU to the UIO-PCI-GENERIC driver. Assuming that the device has a PCI address of 0000:00:04.0, you rebind it with the following command:

modprobe uio_pci_genericecho -n 0000:00:04.0 > /sys/bus/pci/drivers/virtio-pci/unbindecho "1af4 1054" > /sys/bus/pci/drivers/uio_pci_generic/new_id 

Then, you can use it in the DPDK like other Cryptodev PMDs, and experience the acceleration of the host side.

## References

Introduction to the Crypto API in the DPDK Vhost Library

A DPDK Vhost-CryptoSample

Introduction to the DPDK Virtio-User Crypto PMD