FPGA AI Suite: Design Examples User Guide

ID 848957
Date 4/30/2025
Public
Document Table of Contents

16.7. [HL-JTAG] Inference Performance Measurement

The dla_benchmark application reports inference duration and throughput for the entire design example as well as for the FPGA AI Suite IP.

To perform one inference iteration, the host performs the following steps:
  1. Write input data via JTAG to the DDR memory on the FPGA development board.
  2. Program CSRs on the FPGA AI Suite IP to start inference.
  3. Poll the CSRs until the FPGA AI Suite IP completes the inference.
  4. Read the output from the DDR memory to the host via JTAG.
The system duration accounts for all these steps above.

In contrast, the IP duration omits the duration of input and output data transfer.

For this design example, system duration is usually much larger than the IP duration because data transfer over JTAG is relatively slow. Thus, the IP duration and throughput better reflect the performance of the FPGA AI Suite IP.

The following output is an example throughput report generated by the dla_benchmark application after performing 3925 inferences on a quantized ResNet-18 model:
[Step 11/12] Dumping statistics report
count:              3925 iterations
system duration:   464549.5363 ms
IP duration:       17945.7971 ms
latency:            118.2524 ms
system throughput: 8.4490 FPS
number of hardware instances: 1
number of network instances: 1
IP throughput per instance: 218.7142 FPS
IP throughput per fmax per instance: 1.0936 FPS/MHz
IP clock frequency: 200.0000 MHz