5.5.4. The dla_benchmark Performance Metrics
The -save_run_summary option makes the dla_benchmark demonstration application collect performance metrics during inference. These metrics can help you determine how efficient an architecture is at executing a model.
Note: The dla_benchmark application provides throughput in "frames per second". The time per frame (latency) is 1/throughput.
| Statistic | Description | 
|---|---|
| Count | The number of times interference was performed. This is set by the -niter option. | 
| System duration | The total time between when the first inference request was made to when the last request was finished, as measured by the host program. | 
| IP duration | The total time the spent-on inference. This is reported by the IP on the FPGA. | 
| Latency | The median time of all inference requests made by the host. This includes any overhead from OpenVINO™ or the FPGA AI Suite runtime. | 
| System throughput | The total throughput of the system, including any OpenVINO™ or FPGA AI Suite runtime overhead. | 
| Number of hardware instances | The number of IP instances on the FPGA. | 
| Number of network instances | The number graphs that the IP processes in parallel. | 
| IP throughput per instance | The throughput of a single IP instance. This is reported by the IP on the FPGA. | 
| IP throughput per fMAX per instance | The IP throughput per instance value scaled by the IP clock frequency value. | 
| IP clock frequency | The clock frequency, as reported by the IP running on the FPGA device. The dla_benchmark application treats this value as the IP core fMAX value. | 
| Estimated IP throughput per instance | The estimated per-IP throughput, as estimated by the dla_compiler command with the --fanalyze-performance option. | 
| Estimated IP throughput per fmax per instance | The Estimated IP throughput per instance value scaled by the compiler fMAX estimate. |