Deflate Your Data with DPDK Compression API

Big data workloads generate massive quantities of information that need to be sent across the network and stored on data servers. Learn how data compression can be useful in your applications, and learn about the Data Plane Development Kit compression API and how it can help deflate your data!

Learn more about the Data Plane Development Kit

Subscribe to the Intel Software YouTube Channel

Hi. I’m Sujata from Intel. In this video, we tell you about Data Compression and how it can be useful in your applications. We then introduce the DPDK compression API and how it can help deflate your data!

Big data workloads generate massive quantities of information that need to be sent across the network and stored on data servers making compression critical. This results in two use-cases for compression which can be served by DPDK - networking and storage.

Compressing data before it is transmitted reduces the load on the network, therefore, increasing the available bandwidth.

The storage use case is concerned with saving the data onto persistent storage devices as well as hard drives. If this data is compressed it allows for greater file system efficiency as well as the obvious saving of space.

There is a DPDK API called Compressdev that addresses these use-cases by enabling an application to compress and decompress data.

It was released in DPDK 18.05 and has a wide range of features. It supports the compression algorithms Deflate & LZS as well as a choice of compression levels depending on whether your use-case requires high compression throughput or a high compression ratio.

It allows a compression application to interface with either a hardware or software compression engine to execute data compression.

Both stateless and stateful compression are supported by the API.

In stateless compression, each data chunk is independently compressed and decompressed. This facilitates high throughput as many operations can be carried out in parallel.

In stateful compression, one data chunk can be compressed better by using the data gathered during compression of the previous data chunk.

Currently, only stateless compression has been implemented by the DPDK drivers. But, as mentioned, both are still supported by the DPDK API for an application to call.

Compressdev also supports checksum and hash generation on the data being compressed.

For large amounts of data, or where the data originates in a set of separate buffers, chained mbufs can be used, this is a typical scenario for storage workloads.

Poll Mode Drivers - or PMDs - sit below the API and implement the compression service using various compression engines. An application can use any of these engines through the same generic compressdev API.

There are currently four compression PMDs up-streamed in DPDK, two using software accelerators and two using hardware accelerators.

The Intel ISA-L poll mode driver uses the compression engine from Intel’s Intelligent Storage Acceleration Library.

This library can run on any platform but is optimized for best performance on Intel CPUs. This PMD can achieve best compression ratios when paired with Intel CPUs with the AVX512 or AVX2 instruction sets.

There is also a QAT poll mode driver, which utilizes Intel’s QuickAssist family of hardware accelerators.

By offloading the compression workload to QAT’s optimized acceleration engines, which can process many compression operations in parallel, an application can save CPU cycles and use them for other on-core workloads.

Cavium has contributed an Octeon tx PMD for use with their hardware accelerator family and also a PMD which uses the standard zlib software library.

There is a suite of compressdev unit tests available in DPDK which runs against the API and can be used to validate any of the drivers.

Thanks for watching, follow the links to learn more about DPDK, and the DPDK Compression API. Don’t forget to like and subscribe!