FPGA AI Suite Handbook

ID 863373
Date 11/21/2025
Public
Document Table of Contents

2.4.1.2.1. Comparing ML Architectures using the Compiler Report

The parallelism of processing elements (PEs) affects throughput and resource usage. The following table shows the PE throughput as an assessment of the PE performance with DDR transfer overhead discounted.

PE throughput and resources usage increase as the level of parallelism increases.

The values in this table are based on the AGX7_Performance architecture targeting a ResNet50 graph. The values were obtained from the FPGA AI Suite compiler report.
 

c_vector=4

k_vector=4

c_vector=16

k_vector=16

c_vector=16

k_vector=32

num_lanes=4

Architecture

Minimal parallelism

Unrolled (16x) dot-product

Unrolled (2x) PEs

Unrolled (4x) PE Arrays

PE Throughput

2fps

28fps

54fps

178fps

Area

ALMs: 29475

DSPs: 26

M20Ks: 296

ALMs: 32310

DSPs: 122

M20Ks: 685

ALMs: 37436

DSPs: 202

M20Ks: 786

ALMs: 122944

DSPs: 768

M20Ks: 900