Code Sample: Implement Python* Bindings for pmemkv Key-Value Store

Published: 10/08/2019  

Last Updated: 10/08/2019

By Igor Chorazewicz, Szymon Romik

File(s)

GitHub*

License

BSD 3-Clause

Optimized for...  

OS

Linux* kernel version 4.3 or higher

Programming Language

Python* 3.6.x and later

Hardware

2nd generation Intel® Xeon® Scalable processor and Intel® Optane™ DC persistent memory

Emulated: See How to Emulate Persistent Memory Using Dynamic Random-access Memory (DRAM)

Software (programming language, tool, IDE, framework)

C++ Compiler, Persistent Memory Development Kit (PMDK) and persistent memory key-value (pmemkv) store libraries

Prerequisites

Familiarity with Python* 3 and C++

Introduction

This article describes how Intel implemented a new language binding for the pmemkv datastore so it could be used from Python* 3 applications. The Python bindings leverage pmemkv APIs to implement the Python language bindings for pmemkv applications in persistent and volatile memory. Pmemkv is part of the Persistent Memory Development Kit (PMDK).

You can use the design and development process outlined here as a guide for creating additional language bindings for pmemkv. Python developers should be able to install the pmemkv shared library, install the pmemkv Python bindings, and then use the Python bindings to create applications using the provided database class.

Binding from a programming language to a library is an application programming interface (API), which provides the glue code to use that library in a given programming language. Bindings are wrapper libraries that bridge between two programming languages so that a library written for one language can be used in another language. The creation of library bindings provides the ability to reuse software and to reduce the re-implementation of a library in several languages, as well as reduce the difficulty of efficiently implementing some algorithms in certain high-level languages.

Python bindings are used when a newly written C library is to be used from Python. They allow the pmemkv C++ library to be easily used from Python applications.

The design allows for the implementation of the Python bindings and wrapper layer on top of the existing pmemkv datastore for persistent and volatile engines, without requiring any core API changes.

This article assumes that you have a basic understanding of persistent memory concepts and are familiar with some elementary features of the PMDK. For more information, see this introduction to PMDK.

Solution Summary

Python bindings for pmemkv make a request to the pmemkv datastore to fetch or store the data into the persistent memory. Figure 1 shows how the Python bindings facilitate communication between the user application and persistent memory.

Python architecture overview
Figure 1. High-level architecture diagram for Python* bindings for pmemkv.

Intel created the Python bindings to be similar to existing Ruby* bindings, which consist of the Python database class and pmemkv native extensions interface. The Python database class has all the same API methods as its Ruby counterpart with the same method names. The pmemkv native extensions interface is a library that acts as a bridge between Python database class and pmemkv KVEngine.

The development environment is an Ubuntu* 18.14 virtual machine (VM) with the emulation of persistent memory. The basic functionality is implemented as a part of the Python bindings using the existing Ruby bindings for pmemkv.

Environment Details

For the development and test environment, a VM running on a VMware* workstation was used, with the hardware and software configuration listed in Table 1.

Table 1. Development and test environment details

RAM

8 GB

HDD

100 GB

Processor cores

4

Operating system

Ubuntu* 18.04

Kernel

Version 4.18

PMDK

Version 1.6

Libpmemobj-cpp

Version 1.6

Memkind

Version 1.9.0

RapidJSON

Version 1.1.0

Intel® Threading Building Blocks

2019 Update 5

pmemkev

Latest version

Python

3.7.3

Python Bindings Functionality

Intel used the Ruby bindings as a base application for the Python bindings design. The Python bindings used the extern C interfaces provided by the pmemkv shared library. Unlike the Ruby bindings, Python bindings are not implemented using foreign function interface (FFI) because it’s known to be slow. Instead, Python bindings are implemented as a native extension that directly makes extern C calls for good performance.

The Python bindings for pmemkv are implemented in the pmemkv.py source file, and the pmemkv native extension interface is implemented in the kvengine.cc source file. Setup.py is the driver script used to build the kvengine.cc source. It is linked with the pmemkv shared library (libpmemkv.so), and then finally gets the pmemkv native extension interface shared library, which contains the persistent memory key-value native interface (pmemkv_NI) module. The pmemkv_NI module is imported into database class to communicate with the pmemkv datastore to fetch and store the key-value data into persistent memory.

import pmemkv_NI
class Database:

def __init__(self, engine, config):
  pmemkv_NI.Start(engine, config)

def stop(self):
  pmemkv_NI.stop() 

def get_keys(self, func):
  pmemkv_NI.get_keys(func)

def get_keys_above(self, key, func):
  pmemkv_NI.get_keys_above(key, func)

def get_keys_below(self, key, func):
  pmemkv_NI.get_keys_below(key, func)

def get_keys_between(self, key1, key2, func):
  pmemkv_NI.get_keys_between(key1, key2, func)

def get_keys_strings(self, func, encoding = 'utf-8'):
  pmemkv_NI.get_keys(lambda k: func(k.encode(encoding)))

def get_keys_strings_above(self, key, func, encoding = 'utf-8'):
  pmemkv_NI.get_keys_above(key, lambda k: func(k.encode(encoding)))

def get_keys_strings_below(self, key, func, encoding = 'utf-8'):
  pmemkv_NI.get_keys_below(key, lambda k: func(k.encode(encoding)))

def get_keys_strings_between(self, key1, key2, func, encoding = 'utf-8'):
  pmemkv_NI.get_keys_between(key1, key2, lambda k: func(k.encode(encoding)))

def count_all(self):
  pmemkv_NI.count_all()

def count_above(self, key):
  pmemkv_NI.count_above(key)

def count_below(self, key):
  pmemkv_NI.count_below(key)

def count_between(self, key1, key2):
  pmemkv_NI.count_between(key1, key2)

def get_all(self, func):
  pmemkv_NI.get_all(func) 

def get_above(self, key, func):
  pmemkv_NI.get_above(key, func)

def get_below(self, key, func):
  pmemkv_NI.get_below(key, func)

def get_between(self, key1, key2, func):
  pmemkv_NI.get_between(key1, key2, func)

def get_all_string(self, func, encoding = 'utf-8'):
  pmemkv_NI.get_all(lambda k, v: func(k.encode(encoding), v.encode(encoding)))

def get_string_above(self, key, func, encoding = 'utf-8'):
  pmemkv_NI.get_above(key, lambda k, v: func(k.encode(encoding), v.encode(encoding)))
def get_string_below(self, key, func, encoding = 'utf-8'):
  pmemkv_NI.get_below(key, lambda k, v: func(k.encode(encoding), v.encode(encoding))) 

def get_string_between(self, key1, key2, func, encoding = 'utf-8'):
  pmemkv_NI.get_between(key1, key2, lambda k, v: func(k.encode(encoding), v.encode(encoding))) 

def exists(self, key):
  pmemkv_NI.exists(key) 

def get(self, key):
  pmemkv_NI.get(key)

def get_string(self, key, encoding = 'utf-8'):
  pmemkv_NI.get(key).encode(encoding) 

def put(self, key, value):
  pmemkv_NI.put(key, value)

def remove(self, key):
  pmemkv_NI.remove()

Code Sample 1. Import the pmemkv_NI module into the database class to communicate with the pmemkv datastore to fetch and store key-value data into persistent memory.

Configuration

The Python bindings collect the name of the engine to be started and its configuration from the user application. Create your configuration in a JavaScript Object Notation (JSON) document with the path and size of the given engine.

The prototype below shows the Constructor (__init__) of Database Class, which starts the engine when you create an instance for the Database Class. The parameters of this method are engine: vsmap and config: JSON document.

def __init__(self, engine, config):

Below is the user application calling mechanism, Database Constructor, to start the given engine. The JSON document is its configuration.

Database('vsmap', '{"path":"/dev/shm/",”size”:1073741824}')

Intel used the Database Stop method to stop the engine with the help of the pmemkv native extensions interface. A user application can call the Database Stop method, as shown in the example below.

def stop(self):  
  	if not self.stopped:   
    	self.stopped = True  
    	pmemkv_NI.stop()

APIs

To use the Python bindings, the Python 3 applications use the APIs provided by the Python bindings for pmemkv. Internally, the Python bindings use the pmemkv KVEngine with the help of the pmemkv native extensions interface to store and retrieve the key-value data. The Python bindings provide the following APIs to developers for reusing the pmemkv C++ KVEngine from Python applications.

Get

Read the key from the pmemkv datastore. The method gets the key from the end user and calls the pmemkv extern C API using the pmemkv native extensions interface. It fetches the value from the pmemkv datastore and returns it to the user application if the key is present in persistent memory. If no value is found, it returns not found. The Exception is raised when the pmemkv status is failed. The method is implemented as follows:

def get(self, key):
    return pmemkv_NI.get(key)
    

Get String

Read the key from the pmemkv datastore and return the encoded value to the user application. The method gets the key and encoding algorithm from the end user, and then calls the pmemkv extern C API using the pmemkv native extensions interface. It fetches the value from the pmemkv datastore (if the key is present in persistent memory), encodes the value as per the given encoding algorithm (utf-8 is the default encoding algorithm), and returns to the user application. If no value is found, it returns not found. The Exception is raised when the pmemkv status is failed. The method is implemented as follows:

def get_string(self, key, encoding = 'utf-8'):
    return pmemkv_NI.get(key)
encode (encoding)

Put

Write a key-value pair to the pmemkv datastore. The API gets the key and value from the end user and writes the data into the pmemkv datastore with the pmemkv native extensions interface. It raises the Exception if the key-value pair is unable to store into pmemkv datastore. The method is implemented as follows:

def put(self, key, value):
    return pmemkv_NI.put(key, value)

Remove

Remove the key from the pmemkv datastore. The method takes the key to be deleted from the user application and removes it from the pmemkv datastore using the pmemkv native extensions interface. The Exception is raised when the pmemkv status is failed; otherwise, it returns the key removable status to the user application. The method is implemented as follows:

def remove(self, key):  
    return pmemkv_NI.remove(key)

Exists

Check the key existence in pmemkv datastore. The method takes the key from user application and calls the pmemkv extern C API using the pmemkv native extensions interface. It searches the key in the pmemkv datastore and sends back the key availability status to the user application. The Exception is raised when the pmemkv status is failed. The method definition is below:

def exists(self, key):  
    return pmemkv_NI.exists(key)

Get Keys

Fetch all keys from pmemkv datastore. The method takes the callback function from the user application and calls the pmemkv extern C API using the pmemkv native extensions interface. It fetches all key-value pairs from the pmemkv datastore, extracts the keys from the resulting key-value pairs, and sends them back to the user application through the given callback function. It raises the Exception if the pmemkv status is failed; otherwise, it returns success to the user application. The method definition is below:

def get_keys(self, func):   
    return pmemkv_NI.get_keys(func)

Get Keys Above

Fetch the API matched keys from the pmemkv datastore. The method takes the key and callback function from the user application and calls the pmemkv extern C API using the pmemkv native extensions interface. It fetches the key-value pairs from the beginning of the pmemkv datastore until the key matches. It extracts the keys from the resulting key-value pairs and sends them back to the user application through the given callback function. It raises the Exception if the pmemkv status is failed; otherwise, it returns success to the user application. The method definition is below.

def get_keys_above(self, func):   
    return NI.get_keys_above(key, func)

Get Keys Below

Fetch the API matched keys from pmemkv datastore. The method takes the key and callback function from the user application and invokes the pmemkv extern C API using the pmemkv native extensions interface. It fetches the key-value pairs from the key matched in the pmemkv datastore until the end. It extracts the keys from the resulting key-value pairs and sends them back to the user application through the given callback function. It raises the Exception if the pmemkv status is failed; otherwise, it returns success to the user application. The method definition is below.

def get_keys_ below(self, func):   
    return pmemkv_NI.get_keys_below(key, func)

Get Keys Between

Fetch the API matched keys from the pmemkv datastore. The method takes key1, key2, and callback function from the user application and invokes the pmemkv extern C API using the pmemkv native extensions interface. It fetches the key-value pairs present in between key1 and key2 from the pmemkv datastore. It extracts the keys from the resulting key-value pairs and sends them back to the user application through the given callback function. It raises the Exception if the pmemkv status is failed; otherwise, it returns success to the user application. The method definition is below.

def get_keys_between(self, key1, key2 func):
    return pmemkv_NI.get_keys_between(key1, key2, func)

Get Keys Strings

Fetch all the keys from pmemkv datastore and return the encoded keys. The method takes the callback function and encoding algorithm (utf-8 is the default encoding algorithm) from the user application and invokes the pmemkv extern C API using the pmemkv native extensions interface. It fetches all the key-value pairs from the pmemkv datastore, extracts the keys from the resulting key-value pairs, and encodes them per the given encoding algorithm. The encoded keys are sent back to the user application through the given callback function. It raises the Exception if the pmemkv status is failed; otherwise, it returns success to the user application. The method definition is below.

def get_keys_between(self, key1, key2 func):
    return pmemkv_NI.get_keys_between(key1, key2, func)

Get Keys Strings Above

Fetch the API matched keys from pmemkv datastore and return the encoded keys. The method takes the key, callback function, and encoding algorithm (utf-8 is the default encoding algorithm) from the user application and calls the pmemkv extern C API using the pmemkv native extensions interface. It fetches the key-value pairs from the beginning of the pmemkv datastore until the key matches. It extracts the keys from the resulting key-value pairs and encodes them per the given encoding algorithm. The encoded keys are sent back to the user application through the given callback function. It raises the Exception if the pmemkv status is failed; otherwise, it returns success to the user application. The method definition is below.

def get_keys_strings_above(self, key, func, encoding = 'utf-8'):
    return pmemkv_NI.get_keys_above(key, lambda k: func(k.encode(encoding)))

Get Keys Strings Below

Fetch the API matched keys from the pmemkv datastore and return the encoded keys. The method takes the key, callback function, and encoding algorithm (utf-8 is the default encoding algorithm) from the user application and calls the pmemkv extern C API using the pmemkv native extensions interface. It fetches the key-value pairs from key matches in the pmemkv datastore until the end. It extracts the keys from the resulting key-value pairs and encodes them per the given encoding algorithm. The encoded keys are sent back to the user application through the given callback function. It raises the Exception if the pmemkv status is failed; otherwise, it returns success to the user application. The method definition is below.

def get_keys_strings_below(self, key, func, encoding = 'utf-8'):
    return pmemkv_NI.get_keys_below(key, lambda k: func(k.encode(encoding)))

Get Keys Strings Between

Fetch the API matched keys from pmemkv datastore and return the encoded keys. The method takes key1, key2, callback function, and encoding algorithm (utf-8 is the default encoding algorithm) from the user application and calls the pmemkv extern C API using the pmemkv native extensions interface. It fetches the key-value pairs present in between key1 and key2 from the pmemkv datastore. It extracts the keys from the resulting key-value pairs and encodes them per the given encoding algorithm. The encoded keys are sent back to the user application through the given callback function. It raises the Exception if the pmemkv status is failed; otherwise, it returns success to the user application. The method definition is below.

def get_keys_strings_between(self, key1, key2, func, encoding = 'utf-8'):
    result=pmemkv_NI.get_keys_between(key1, key2, lambda k: func(k.encode(encoding)))

Get All

Fetch all the key-value pairs from the pmemkv datastore. The method takes the callback function from the user application and calls the pmemkv extern C API using the pmemkv native extensions interface. It fetches all the key-value pairs from the pmemkv datastore and sends them back to the user application through the given callback function. It raises the Exception if the pmemkv status is failed; otherwise, it returns success to the user application. The method definition is below.

def get_all(self, func):
    return pmemkv_NI.get_all(func) 

Get Above

Fetch the API matched key-value pairs from pmemkv datastore. The method takes the key and callback function from the user application and calls the pmemkv extern C API using the pmemkv native extensions interface. It fetches the key-value pairs from the beginning of the pmemkv datastore until the key matches. The resulting key-value pairs are sent back to the user application through the given callback function. It raises the Exception if the pmemkv status is failed; otherwise, it returns success to the user application. The method definition is below.

def get_above(self, key, func):
    return pmemkv_NI.get_above(key, func)

Get Below

Fetch the API matched key-value pairs from the pmemkv datastore. The method takes the key and callback function from user application and calls the pmemkv extern C API using the pmemkv native extensions interface. It fetches the key-value pairs from key matches in the pmemkv datastore until the end. The resulting key-value pairs are sent back to the end user through the given callback function. It raises the Exception if the pmemkv status is failed; otherwise, it returns success to the user application. The method definition is below.

def get_below(self, key, func):
    return pmemkv_NI.get_below(key, func)

Get Between

Fetch the API matched key-value pairs from the pmemkv datastore. The method takes key1, key2, and callback function from the user application and calls the pmemkv extern C API using the pmemkv native extensions interface. It fetches the key-value pairs present in between key1 and key2 from the pmemkv datastore. The resulting key-value pairs are sent back to the user application through the given callback function. It raises the Exception if the pmemkv status is failed; otherwise, it returns success to the user application. The method definition is below.

def get_between(self, key1, key2 func):
    return pmemkv_NI.get_between(key1, key2, func)

Get All Strings

Fetch all the key-value pairs from the pmemkv datastore and return the encoded key-value pairs. The method takes the callback function and encoding algorithm (utf-8 is the default encoding algorithm) from the user application and calls the pmemkv extern C API using the pmemkv native extensions interface. It fetches all the key-value pairs from the pmemkv datastore and encodes them per the given encoding algorithm. The resulting encoded key-value pairs are sent back to the user application through the given callback function. It raises the Exception if the pmemkv status is failed; otherwise, it returns success to the user application. The method definition is below.

def get_all_strings(self, func, encoding = 'utf-8'):
    return pmemkv_NI.get_all(lambda k, v: func(k.encode(encoding), v.encode(encoding))) 

Get String Above

Fetch the API matched key-value pairs from the pmemkv datastore and return the encoded key-value pairs. The method takes the key, callback function, and encoding algorithm (utf-8 is the default encoding algorithm) from the user application and calls the pmemkv extern C API using the pmemkv native extensions interface. It fetches the key-value pairs from the beginning of the pmemkv datastore until the key matches and encodes them per the given encoding algorithm. The resulting encoded key-value pairs are sent back to the user application through the given callback function. It raises the Exception if the pmemkv status is failed; otherwise, it returns success to the user application. The method definition is below.

def get_string_above(self, key, func, encoding = 'utf-8'):
    return pmemkv_NI.get_above(key,lambda k, v:func(k.encode(encoding), v.encode(encoding))) 

Get String Below

Fetch the API matched key-value pairs from pmemkv datastore and return the encoded key-value pairs. The method takes the key, callback function, and encoding algorithm (utf-8 is the default encoding algorithm) from the user application and calls the pmemkv extern C API using the pmemkv native extensions interface. It fetches the key-value pairs from key matches in the pmemkv datastore until the end, and encodes them per the given encoding algorithm. The resulting encoded key-value pairs are sent back to the user application through the given callback function. It raises the Exception if the pmemkv status is failed; otherwise, it returns success to the user application. The method definition is below.

def get_string_below(self, key, func, encoding = 'utf-8'):
    return pmemkv_NI.get_below(key,lambda k, v:func(k.encode(encoding), v.encode(encoding))) 

Get All String Between

Fetch the API matched key-value pairs from the pmemkv datastore and returns the encoded key-value pairs. The method takes key1, key2, callback function, and encoding algorithm (utf-8 is the default encoding algorithm) from the user application and calls the pmemkv extern C API using the pmemkv native extensions interface. It fetches the key-value pairs present in between key1 and key2 from the pmemkv datastore and encodes them per the given encoding algorithm. The resulting encoded key-value pairs are sent back to the user application through the given callback function. It raises the Exception if the pmemkv status is failed; otherwise, it returns success to the user application. The method definition is below.

def get_string_between(self, key1, key2, func, encoding = 'utf-8'):
    return pmemkv_NI.get_between(key1, key2, lambda k, v: func(k.encode(encoding), v.encode(encoding))) 

Count All

Get the number of keys present in pmemkv datastore. The method invokes the pmemkv extern C API using the pmemkv native extensions interface and finds the number of keys present in the pmemkv datastore. It raises the Exception if the pmemkv status is failed; otherwise, it returns the keys count to the user application. The method definition is below.

def count_all(self):  
    return pmemkv_NI.count_all() 

Count Above

Get the number of API matched keys from pmemkv datastore. The method takes the key from the user application and calls the pmemkv extern C API using the pmemkv native extensions interface. It gets the number of keys present from the beginning of the pmemkv datastore until the key matches. It raises the Exception if the pmemkv status is failed; otherwise, it returns the keys count to the user application. The method definition is below.

def count_above(self, key): 
    return pmemkv_NI.count_above(key) 

Count Below

Get the number of API matched keys from the pmemkv datastore. The method takes the key from the user application and calls the pmemkv extern C API using the pmemkv native extensions interface. It gets the number of keys present from the key matches in the pmemkv datastore until the end. It raises the Exception if the pmemkv status is failed; otherwise, it returns the key count to the user application. The method definition is below.

def count_below(self, key): 
    return pmemkv_NI.count_below(key) 

Count Between

Get the number of API matched keys from the pmemkv datastore. The method takes the key1 and key2 from user application and calls the pmemkv extern C API using the pmemkv native extensions interface. It gets the number of keys present in between key1 and key2 from the pmemkv datastore. It raises the Exception if the pmemkv status is failed; otherwise, it returns the key count to the user application. The method definition is below.

def count_between(self, key1, key2): 
    return pmemkv_NI.count_between(key1, key2) 

Use the Python Bindings for pmemkv

The Python bindings for pmemkv are open source and available. To use the Python bindings for pmemkv, follow these steps:

  1. Install the PMDK libraries.
  2. Install the pmemkv libraries.
  3. Download and clone the pmemkv-python repo, and then install the required dependencies per the ReadMe file.
  4. From the pmemkv-python directory, run the test application pmemkv_tests.
  5. Download and clone the pmemkv-tools repo, and then install the required dependencies per the ReadMe file.
  6. From the pmemkv-tools directory, run make baseline_python, make example_python, and make iteration_python.

Conclusion

Python bindings for pmemkv help Python developers reuse the existing pmemkv software without having to re-implement it in the Python language.

Using Python bindings, you can reuse the pmemkv source code to save time and resources. It also reduces redundancy by taking advantage of assets that have already been created in some form within the software product development process.

Resources

A Guide to pmemkv Engines
The Persistent Memory Development Kit (PMDK)
How to Emulate Persistent Memory
Write a Storage Engine for pmemkv
Extend Python with C or C++
Ruby Bindings for pmemkv

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.