Using PMDK, Aerospike Optimized Their Codebase for Intel® Optane™ Persistent Memory

ID 658992
Updated 5/21/2020
Version Latest
Public

author-image

By

Nearly a decade ago, Intel and Aerospike began to work together to imagine a storage engine capable of storing terabytes of data with sub-millisecond access speeds. Milestones have been achieved along the way, with Aerospike adopting the latest Intel® technologies. The latest adoption of Intel® Optane™ persistent memory has enabled uncompromised scale, less downtime, and better cost models than ever before.

Intel Optane Persistent Memory

Intel Optane persistent memory is a new tier of memory that has the unique ability to act as memory and storage at the same time. As the latency pyramid shows below, persistent memory provides latencies similar to memory, measured in nanoseconds, while providing persistence. Block storage provides persistence with latencies starting in the microseconds and going up from there, depending on the technology.

Intel persistent memory affords a solution for the high cost and limited capacity constraints felt when scaling with traditional DRAM. These modules are currently available in three capacity sizes, 128 gigabytes (GB), 256 GB, and 512 GB. But, unlike DRAM, data stored on these devices is persistent. Meaning, even on an unplanned shutdown, data is retained.

Intel Optane persistent memory’s two modes of operation offer distinct benefits.

Memory Mode allows the system to use persistent memory capacity as the main memory. In this case, DRAM is used as an additional cache in front of the persistent memory and the memory controller controls the data placement between DRAM and persistent memory. Memory mode is the ‘out of the box’ solution that doesn’t require code changes. Effectively, this means the total amount of system memory can be significantly expanded with seamless memory performance!

In App Direct Mode, persistent memory is exposed as a special block device to the OS, and applications can use this device for data persistence. Although code changes are necessary to see the highest impact, in return, applications have full control of data placement, which can help improve performance.

Aerospike

Aerospike is a no-SQL database that has historically been constrained by the capacity of DRAM available to hold the database index (the keys). Applications that run out of DRAM capacity are forced to either purchase larger capacity DIMMs (if available) or scale out to more nodes. Additionally, if a system reboots, this would erase the DRAM index, resulting in a need to rebuild the index by scanning all records in the database. This process is costly and takes many hours.

In Aerospike Enterprise Edition 4.5, developers solve this problem by using memory-mapped files to allocate memory from persistent memory, rather than from a pool of shared DRAM. Because of Intel persistent memory’s large capacity, this allows customers to store more records per node, with the added benefit of the index being persistent, meaning full database restarts are possible without causing index rebuilds.

By using the Persistent Memory Development Kit (PMDK), developers have full control of memory placement. The PMDK is a collection of open-source vendor-neutral libraries for managing persistent memory. These libraries are based on the Storage Networking Industry Association (SNIA) persistent memory programming model, which has memory-mapped files at its core.

The core PMDK library is libpmemobj. This library solves many commonly encountered algorithmic and data structure problems encountered when programming for persistent memory. Aerospike developers eventually decided against this library, though, since Aerospike’s core API already had capabilities to manage the index structure once memory was allocated.

Instead, developers turned to libpmem, a low-level C library that provides a necessary abstraction over the primitives exposed by the operating system. It supports the implementation of low-level persistent memory block management including recovery after restart.

Using libpmem, developers modified the existing software to allocate blocks of memory from persistent memory and memory-mapped those blocks. This leveraged the persistent memory synchronization primitives such as user-space flushing, to replace msync(), thus reducing kernel overhead.

In Aerospike Enterprise Edition 4.8, developers went one step further to allow storing both the database index and the data in persistent memory. This results in much higher performance compared to data stored in SSD, and larger data at lower cost compared to data stored in DRAM, without a significant performance penalty.

Summary

A quick refactor of the code with libpmem enabled Aerospike to improve their customers’ experience to better meet their needs. By storing the index in persistent memory, this reduced the classic index-rebuild problem that in-memory databases face. Additionally, Intel persistent memory’s large capacity makes it easier for customers to scale their applications and retain DRAM-like speeds.

See performance results and find out how Aerospike achieved greater uptime in this solution brief: Accelerated Real-Time Computing at Petabyte Scale.