3.2. Estimating the Performance of a Graph
To estimate the performance of a graph on an architecture, use the --fanalyze-performance dla_compiler command option.
The dla_compiler command compiles the graph for the specified architecture to estimate its performance. The performance estimator assumes that any portions of the graph that are assigned to the CPU run inference with zero latency. That is, the performance estimate accounts only for the FPGA portions of the graph.
The performance estimator also estimates the average memory bandwidth and the memory requirements of the graph. The estimated memory requirement is typically an underestimate because the memory estimates assume one input buffer and one output buffers while the FPGA AI Suite runtime uses a default of five of each.
The list of required and optional arguments includes all the required and optional arguments from Compiling a Graph . In addition, the following options are specific to estimating graph performance:
Option |
Description |
---|---|
--fanalyze-performance | [Required] Enables the performance estimator. |
--fassumed-fmax-core= <assumed fMAX> | [Optional] Specifies the assumed fMAX of the compiled FPGA AI Suite IP. The performance estimator does not have the ability to estimate fMAX of a given IP parameterization, nor does it know which speed grade the IP targets. Typically, the IP achieves 300 MHz or higher on a C2 Arria® 10 device.
The default fMAX depends on the device family:
|
--fdump-performance-report | [Optional] An optional output file for the performance estimate, otherwise the performance summary is displayed on the terminal. |
The simplest dla_compiler command format for estimating the performance of a graph is as follows:
dla_compiler \ --network-file <path to graph.xml> \ --march <path to .arch file> \ --fanalyze-performance
Example Command
dla_compiler \ --network-file ResNet50.xml ResNet101.xml \ --march $COREDLA_ARCH/example_architectures/A10_Generic.arch \ --fanalyze-performance \ --fassumed-fmax-core=300