6.3.1. OpenVINO™ FPGA Runtime Plugin
The OpenVINO™ Plugin architecture is described in the OpenVINO™ Developer Guide for Inference Engine Plugin Library.
The source files are located under runtime/plugin. The three main components of the runtime plugin are the Plugin class, the Executable Network class, and the Inference Request class. The primary responsibilities for each class are as follows:
- Initializes the runtime plugin with an Intel® FPGA AI Suite Architecture File which you set as an OpenVINO™ configuration key (refer to Running the Ported OpenVINO Demonstration Applications).
- Contains QueryNetwork function that analyzes network layers and returns a list of layers that the specified architecture supports. This function allows network execution to be distributed between FPGA and other devices and is enabled with the HETERO mode.
- Creates an executable network instance in one of the following ways:
- Just-in-time (JIT) flow: Compiles a network such that the compiled network is compatible with the hardware corresponding to the Intel FPGA AI Suite Architecture File, and then loads the compiled network onto the FPGA device.
- Ahead-of-time (AOT) flow: Imports a precompiled network (exported by Intel FPGA AI Suite compiler) and loads it onto the FPGA device.
Executable Network Class
- Represents an Intel FPGA AI Suite compiled network
- Loads the compiled model and config data for the network onto the FPGA device that has already been programmed with an Intel FPGA AI Suite AFU/AF bitstream. For two instances of Intel FPGA AI Suite, the Executable Network class loads the network onto both instances, allowing them to perform parallel batch inference.
- Stores input/output processing information.
- Creates infer request instances for pipelining multiple batch execution.
Infer Request class
- Runs a single batch inference serially.
- Executes five stages in one inference job – input layout transformation on CPU, input transfer to DDR, Intel FPGA AI Suite FPGA execution, output transfer from DDR, output layout transformation on CPU.
- In asynchronous mode, executes the stages on multiple threads that are shared across all inference request instances so that multiple batch jobs are pipelined, and the FPGA is always active.