FPGA AI Suite: IP Reference Manual

ID 768974
Date 4/21/2025
Public
Document Table of Contents

2.2.2. DDR-Free Streaming Performance

DDR-free mode relies solely on on-chip memory for data storage. This mode is particularly beneficial for applications where minimizing latency and maximizing throughput are critical, as it eliminates the need for data transfer between on-chip and off-chip memory. However, DDR-free mode imposes certain constraints:
  • Memory Constraints

    The architecture file must have sufficient on-chip memory to accommodate all graph parameters in the filter scratchpad. There must also be sufficient on-chip memory to store all intermediate surfaces in the stream buffer.

  • High Coupling to Graph

    Due to the memory constraints, architecture files in DDR-free mode are coupled to specific graphs. FPGA AI Suite does not support loading a new model into an instance of the DDR-free IP to replace the original target model.

For more information about DDR-free operation, refer to DDR-Free Operation.

The table that follows presents performance measurements for a DDR-free streaming architecture built for the ResNet-18 PyTorch. This analysis focuses on two metrics: core IP throughput and IP throughput:
  • Core IP throughput

    This metric isolates the performance of the core FPGA AI Suite IP, excluding the input and output streamer. It highlights the computational efficiency of the processing elements within the architecture

  • IP throughput

    This metric captures the overall inference performance of the overlay IP, including the input and output streaming components.

public/resnet-18-pytorch

Architecture ALMs DSPs Core IP Throughput [fps] IP Throughput

[fps]

AGX7_Streaming_Ddrfree_Resnet18 77.7k 296 171 168