Optimize Data Manipulation & Data Processing Functions Using Intel® QPL and Intel® DML

Unlock the advanced capabilities of Intel® IAA and Intel® DSA to achieve higher performance and obtain faster, secure data movement

Get the Latest on All Things CODE



The Power of Data Manipulation: An Introduction to Intel® QPL and Intel® DML

4th Gen Intel® Xeon® processors have several new hardware accelerator capabilities, including Intel® In-Memory Analytics Accelerator (Intel® IAA) and Intel® Data Streaming Accelerator (Intel® DSA).1

These accelerators, in combination with accompanying software libraries Intel® Query Processing Library (Intel QPL) and Intel® Data Mover Library (Intel DML), respectively, offer great performance benefits to developers who seek to optimize data manipulation/movement, data processing, and analytics, in addition to lowering TCO in datacenters by 50% to 60%.2

Here’s how:

  • Intel IAA provides compression and decompression of very high throughput combined with analytic primitive functions commonly used for data filtering during analytic query processing. Intel QPL helps developers activate this accelerator’s advanced data analytical capabilities.
  • Intel DSA is a high-performance data copy and transformation accelerator targeted for optimizing streaming data movement and transformation operations common with applications for high-performance storage, networking, persistent memory, and various data processing applications. Intel DML helps developers activate this accelerator’s advanced data movement capabilities.

Let’s look at these two open-source libraries in detail to unpack how they help developers move data faster and achieve higher performance.

Intel QPL

What Is Intel QPL?

Intel QPL consists of integrated accelerator IP that speeds up analytics primitives (scan, filter, etc.), CRC calculations, compression, decompression, and more. The library:

  • Supports up to four Intel IAA devices per socket.
  • Provides advanced offload optimizations such as support for accelerator instructions, shared virtual memory, and scalable-IOV support, for seamless sharing from application user-process, containers, and/or virtual machines.
  • Provides high-performance implementations of data-processing functions for existing hardware accelerators and/or software paths in case the hardware accelerator is not available.

Why Use It?

Use Intel QPL to improve performance of database, enterprise data, communications, and scientific/technical applications.


Code written within the library automatically takes advantage of available CPU capabilities. This behavior can provide tremendous development and maintenance savings.

Intel QPL helps increase query throughput for in-memory database and analytics workloads and decreases memory footprint for analytics workloads, including:

  • Commercial in-memory databases – AWS (MemoryDB*, ElastiCache*), GCP (MemoryStore*), Microsoft (AzureSQL*) and more
  • Open source in-memory database/data stores:  RocksDB*, Redis*, Cassandra*, MySQL*, PostgreSQL*, MongoDB*, Memcached*, and more
  • Columnar formats for Big Data Analytics: Apache Parquet*, and Apache ORC*

Use Cases

Current challenges with in-memory databases include parallelism, storage, communication (latency gap between network and main memory), and concurrency. Additionally, developers need to include other factors such as cost, data security, scalability, and compatibility.3

Intel QPL’s high compression and decompression capabilities are designed to help run database and analytics workloads faster. In addition, Intel QPL enables Intel IAA accelerator to reduce the cost of computing, save memory bandwidth, and achieve higher query throughput for applications in big data and in-memory analytic databases, memory page compression, data integrity operations, and more.

With Intel IAA and Intel QPL, developers can obtain up to 3x higher RocksDB performance4 and up to 2x higher performance per watt.5

Solution to Challenges in Database Analytics

As shown in the image below, you can optimize query throughput and decrease memory footprint using IAA and Intel QPL, using their capabilities like compression, decompression, scan/filter, and more.

Intel DML

What Is Intel DML?

Intel DML consists of integrated accelerator IP that helps speed up common data-movement operations. The library:

  • Supports up to four instances per socket for up to 120GB/s (240GB/s bidirectional) bandwidth. This results in increased operations per second and reduced latencies, making Intel DSA-enabled workloads faster and more responsive.
  • Provides advanced offload optimizations such as support for Intel® Accelerator Interfacing Architecture (Intel® AIA) instructions and shared virtual memory (SVM)  and scalable I/O virtualization (SIOV) support for seamless sharing from application user-process, containers, and/or virtual machines.

Why Use It?

Use Intel DML to improve the performance of applications reliant on data movement and cores offload data movement operations to Intel DSA, freeing CPU cycles for higher priority work. Target workloads and usages include:

  • Networking: vSwitch network virtualization
  • Storage: Fast replication across non-transparent bridge (NTB)
  • Application usage examples: Messaging, ERP, In-Memory Databases, Analytics

Use Cases

Current developer challenges in data movement and streaming include data quality, latency, governance, and volume (the volume of data is continually increasing and managing large data sets is a challenge in data transformation). This leads to being data-rich but information-poor. On top of that, developers need to consider governance, compliance, and when gathering real-time insights.

Intel DML helps reduce latency and increase memory-transfer performance by optimizing streaming data-movement and transformation operations commonly used in storage, networking, and various data-processing applications.

Using Intel DML with Intel DSA, developers can achieve up to 60% higher IOPS with NVMe-over-TCP and 37% latency reduction for large packet sequential reads6. The solution also reduces overhead by offloading the most common data-movement tasks; for these, developers can take advantage of the library’s system-solution capabilities to protect the communication path between a host and storage device, when data is more prone to threats, ensuring end-to-end integrity and security.

Solution to Challenges in Data Movement

As shown in the image below, using DSA and Intel DML you can get faster data movement with reduced latency while ensuring data integrity of the workloads. Achieve higher IOPS for secure data movement using capabilities like data protection and integrity functions.

Get the Code Samples

Check out these code samples and engineering recipes to get you started with these optimized libraries to take advantage of Intel IAA and Intel DSA on Intel Xeon processors:

Intel QPL Code Samples

Intel DML Code Samples