3.3. Estimating the Performance of a Partition of a Graph
When the Intel® FPGA AI Suite performs inference for a given machine learning graph by assigning some layers to the FPGA device and some layers to the CPU, the graph is divided into subgraphs. Each subgraph represents a portion of the original graph that executes wholly on the CPU or executes wholly on the FPGA. It is possible to estimate the performance of an individual FPGA subgraph.
To estimate the performance of a subgraph, use the --est-fps-single-subgraph option.
The dla_compiler command compiles the graph for the specified architecture to estimate subgraph performance.
The list of required and optional arguments includes all the required and optional arguments from Compiling a Graph . In addition, the following options are specific to estimating subgraph performance:
|--est-fps-single-subgraph||[Required] Enables the subgraph performance estimator.|
|--est-fps-subgraph-index= <subgraph index>||[Required] Specifies the subgraph index.|
|--est-fps-network-index= <network index>||[Optional] Specifies the network index (in case multiple network files were passed to the dla_compiler command).|
|--fassumed-fmax-core=<assumed fMAX>|| [Optional] Specifies the assumed fMAX of the compiled IP.
The performance estimator does not have the ability to estimate fMAX of a given IP parameterization, nor does it know which speed grade the IP targets. Typically, the IP achieves 300 MHz or higher on a C2 Intel® Arria® 10 device.
The default fMAX is chosen based on the family parameter value in the architecture description file (.arch).
The estimated frame rate (fps) of the subgraph is displayed in the terminal.
The simplest command format for estimating the performance of a subgraph is as follows:
dla_compiler \ --est-fps-single-subgraph \ --est-fps-subgraph-index <index> \ --network-file <path to graph.xml> \ --march <path to .arch file>
dla_compiler \ --est-fps-single-subgraph \ --est-fps-subgraph-index=0 \ --est-fps-network-index=0 \ --network-file ResNet50.xml MobileNet_v2.xml \ --march $COREDLA_ROOT/example_architectures/A10_Generic.arch \ --fassumed-fmax-core=300