Open Programmable Acceleration Engine C API Programming Guide
The OPAE C library (libopae-c) is a
lightweight user-space library that provides abstraction for FPGA resources in a compute
environment. Built on top of the driver stack that
the FPGA device, the library abstracts away
details and exposes the underlying FPGA resources as a set of features accessible from within
software programs running on the host. These features include the acceleration logic
preconfigured on the device, as well as functions to manage and reconfigure the device. Hence,
applications to transparently and seamlessly
advantage of FPGA-based acceleration.
Figure 1. Layered Architecture
By providing a unified C API, the library supports different kinds of FPGA
integration and deployment models, ranging from single-node systems with one or more FPGA
devices to large-scale FPGA deployment in a
A simple use case, for example, is for a
running on a system with an FPGA PCIe device to easily use the FPGA to accelerate certain
algorithms. At the other end of the spectrum, resource management and orchestration services
can use this API to discover and select FPGA resources and then
them to be used
by workloads with acceleration needs.
The purpose of OPAE is to provide a common base layer for as wide a range of
use cases as possible without sacrificing performance or efficiency. It aims at
developers of applications and frameworks from
having to understand the intricacies of the FPGA driver
and FPGA interconnect details by providing a thin abstraction to expose required details of
To that end, OPAE abstracts access to the key components that frameworks and
abstractions need to deal with (for example, FPGA devices and accelerators). It then provides
means to interact with these components in the most efficient way possible. Essentially, it
tries to provide friendly and consistent interfaces to crucial components of the platform. At
the same time, OPAE tries not to constrain frameworks and applications by making optimizations
that do not translate to many use cases - and where it does provide convenience functions or
For example, OPAE provides an interface to allocate physically contiguous
buffers in system memory that can be shared between user-space software and an accelerator.
This interface enables the most basic feature set of allocating and sharing a large page of
memory in one API call; it however does not provide a malloc()-like
interface backed by a memory pool or slab allocator. These kinds of optimizations and added
functionality are left to higher layers of the software stack, which
better suited to make domain-specific optimizations.
Some Key Concepts
The following key concepts are essential for writing code using the OPAE C API. These concepts
are modeled with corresponding data structures and functions in the API
specification, as discussed in the "Object Model" section.
Field Programmable Gate Array (FPGA): is a discrete or integrated peripheral device
connecting to a host CPU via PCIe or other type of interconnects.
Accelerator Functional Unit (AFU):
is a computation logic preconfigured on FPGA with the purpose of accelerating
certain computation. It represents a resource discoverable and usable by
applications. The logic is designed in RTL and synthesized into a bitstream. A
tool (fpgaconf) is provided to reconfigure an FPGA using a
Accelerator Function (AF): A bitstream for an application-specific accelerator logic, for
example, compression, encryption, mathematical operations, etc.
accelerated function implemented in an FPGA, closely related to an AFU. An accelerator tracks the ownership of an AFU (or part of it)
for a process that uses it. An accelerator can be shared by multiple
Shared memory buffers: Memory buffers allocated in
process memory on the host to be shared with an accelerator on the FPGA. Shared
data transfers between
process and the accelerator it owns.
Events: Events are asynchronous notification
mechanism. The FPGA driver triggers
events to indicate error conditions. An accelerator logic can also define its
applications can choose to be notified when certain types of the events occur
and respond accordingly.
Reconfiguration: An AFU can be replaced by another AFU by
application that has appropriate privilege.
Linking with this library is straightforward. Code using this library
should include the header file fpga.h. Taking the
GCC compiler on Linux as an example, the minimalist compile and link line should
Note: Third-party library
library internally uses libuuid and libjson-c;
are not distributed as part of the library. Make sure you have these libraries
Use the Sample Code
The library source
two code samples. Use these samples to learn how to call functions in the library.
Build and run these samples as quick sanity checks to determine if your installation
and environment are set up properly.
For more details about using the sample code, refer to "Running the Hello FPGA
Example" chapter in
Intel® Acceleration Stack for
Xeon® CPU with FPGAs Getting Started Guide, (Board
When successfully built and installed,
can see the following directory structure. This discussion is
using installation on Unix/Linux systems as an example.
a similar situation on Windows and MacOS installations.
Directory & Files
Directory containing all header files
Top-level header for
Header file for accelerator acquire/release, MMIO, memory
management, event handling, etc.
Header file for bitstream manipulation functions
Header file for error reporting functions
Header file for AFU enumeration functions
Header file for FPGA management functions
Various type definitions
Directory containing shared library files
The shared dynamic library for
to link against
Directory containing API documentation
Directory for documentation of HTML format
Directory for documentation of LaTex format
Directory for documentation of Unix man page format
Basic Application Flow
The picture below depicts the basic application flow from the viewpoint of a
user-process. API components are discussed in the next section. The hello_fpga.c sample code is a good example showing the
flow in action.
Figure 2. Basic Flow
The API is designed around an object model that abstracts physical FPGA device
and functions available on the device. The object model is not tied to a particular
Instead, it is a generalized model and can be extended to describe any type of
An enum type to represent the type of an FPGA resource, which
is either FPGA_DEVICE or FPGA_ACCELERATOR. An FPGA_DEVICE object is corresponding
to a physical FPGA device. Only FPGA_DEVICE objects can invoke management
represents an instance of an AFU.
An opaque type to represent a resource
but not necessarily owned by, the calling
process. The calling process must own a resource before it can
invoke functions of the resource.
An opaque type to represent a resource owned by the calling
process. API functions fpgaOpen() and fpgaClose() (see
"Functions" section) acquire and release
ownership of a resource represented by an fpga_handle.
An opaque type for a properties object.
applications use these properties to query and search for
resources that suit their needs. The properties visible to
applications are documented in
"FPGA Resource Properties"
An opaque handle used by the FPGA driver to notify
application about an event, and used by the
application to wait for the notification of the event.
An enum type to represent kinds of events which can be
An enum type to represent the result of an API function. If
the function returns successfully the result is FPGA_OK. Otherwise, the result is
one of the error codes. Function fpgaErrStr() can translate an error code into
These are the properties of a resource that can be queried by a
user-application, by plugging property name for Prop
in the names of fpgaPropertiesGet[Prop]() and
fpga_token of the parent object
The type of the resource: either FPGA_DEVICE or
The bus number
The PCI device number
The PCI function number
The socket ID
The device ID
Number of AFU slots available on an FPGA_DEVICE
The FPGA Interface Manager (FIM) ID of an FPGA_DEVICE
The FPGA Interface Manager (FIM) version of an FPGA_DEVICE
The vendor ID of an FPGA_DEVICE resource
The model of an FPGA_DEVICE resource
The local memory size of an FPGA_DEVICE
The capabilities of an FPGA_DEVICE
The GUID of an FPGA_DEVICE or
The number of MMIO space of an FPGA_ACCELERATOR
The number of interrupts of an FPGA_ACCELERATOR
The state of an FPGA_ACCELERATOR resource:
either FPGA_ACCELERATOR_ASSIGNED or
OPAE C API Return Codes
The OPAE C library returns one of these codes for every public API
function exported. Usually, FPGA_OK denotes successful completion
of the requested operation, while any return code *other* than
FPGA_OK indicates an error or other deviation from the expected
the OPAE C API, always check the API return codes and not the functions that failed
Table 1. OPAE C API Return Codes
Operation completed successfully
Invalid parameter supplied
Resource is busy
An exception occurred
A required resource was not found
Not enough memory to complete operation
Requested operation is not supported
Driver is not loaded
FPGA Daemon (fpgad) is not running
Insufficient privileges or permissions
Error while reconfiguring FPGA
This section illustrates a few typical API usage models with code snippets.
Query and Search for a Resource
first populates a
fpga_properties object with desired properties. Afterwards,
to search for matching resources.
Note:fpgaEnumerate() may return more than one matching
/* Start with an empty properties object */
res = fpgaGetProperties(NULL,&filter);
/* Populate the properties object with desired values.
In this case, we want to search for accelerators that match a
res = fpgaPropertiesSetObjectType(filter,FPGA_ACCELERATOR);
res = fpgaPropertiesSetGuid(filter,guid);
/* Query the number of matched resources */
res = fpgaEnumerate(&filter,1,NULL,1,&num_matches);
/* Return all matched resources in tokens */
res = fpgaEnumerate(&filter,1,tokens,num_matches,&num_matches);
/* Destroy the properties object */
res = fpgaDestroyProperties(&filter);
/* More code */
/* Destroy tokens */
res = fpgaDestroyToken(tokens[i]);
Note: The fpgaEnumerate() function can take multiple fpga_properties
objects (in an array). In this situation, the function returns resources that match
any of the properties object. In other words, the
multiple properties objects are logically OR’ed in the query operation.
fpga_token objects return by fpgaEnumerate() do not signify
ownership. To acquire ownership of a resource represented by a token, pass the token
Acquire and Release a Resource
ownership of a resource is done using fpgaOpen() and
fpgaClose(). The calling process must own the
resource before it can do MMIO, share memory buffers, and use functions offered by
/* Acquire ownership of a resource that was previously returned by
`fpgaEnumerate()` as a token
res = fpgaOpen(token,&handle);
/* More code */
/* Release the ownership */
res = fpgaClose(handle);
Shared Memory Buffer
This code snippet shows how to prepare a memory buffer for sharing between the
calling process and an accelerator.
/* Hint for the virtual address of the buffer */
volatile uint64_t *addr_hint;
/* An ID we can use to reference the buffer later */
/* Flag to indicate if the buffer is preallocated or not */
/* Allocate (if necessary), pin, and map a buffer to be accessible
by an accelerator
res = fpgaPrepareBuffer(handle,BUF_SIZE,(void**)&addr_hint,
/* The actual address mapped to the buffer */
/* Get the IO virtual address for the buffer */
res = fpgaGetIOVA(handle,bufid,&iova);
/* Inform the accelerator about the virtual address by writing to its mapped register file*/
/* More code */
/* Release the shared buffer */
res = fpgaReleaseBuffer(handle,bufid);
Note: The flag variable can take a constant
FPGA_BUF_PREALLOCATED, which indicates that the address space
pointed to by addr_hint is already allocated by the calling
This code snippet shows how to map/unmap the register file of an accelerator into the
virtual memory space of the calling process.
/* Index of the MMIO space. There might be multiple spaces on an accelerator */
uint32_t mmio_num = 0;
/* Mapped address */
/* Map MMIO */
res = fpgaMapMMIO(handle,mmio_num,&mmio_addr);
/* Write to a 32-bit value to the mapped register file at a certain byte
CSR_CTL is the offset in the mapped space to where the value will be
written. It's defined elsewhere.
res = fpgaWriteMMIO32(handle,mmio_num,CSR_CTL,value);
/* More code */
/* Unmap MMIO */
res = fpgaUnmapMMIO(handle,mmio_num);
Note: Every AFU has its own layout of register spaces and its own protocol about how to
control its behavior through the registers. These are defined in the Accelerator Function (AF)
used to implemented the AFU.