FPGA AI Suite Handbook

ID 863373
Date 11/21/2025
Public
Document Table of Contents

6.2. FPGA AI Suite Architecture File Breakdown

This chapter explores customization parameters within an architecture (.arch) file, demonstrates where and how the optimization happens, and hints at the more advanced optimization techniques that are introduced in Optimizing Your FPGA AI Suite IP. A predefined architecture file (AGX7_Generic.arch) is used here as an example.

An architecture file starts with defining the target family and the PE parallelism and precision parameter:
family : ‘AGX7’
k_vector : 32
c_vector : 16
arch_precision : FP13AGX
stream_buffer_depth : 63488
output_channels_max : 16384
Enable element-wise multiplication and the constraints on filter size.
enable_eltwise_mult : true
filter_size_width_max : 28
filter_size_height_max : 28
PE array interleave, described below. Choosing exit_fifo_depth is a system design consideration that costs more area.
pe_array {
  num_interleaved_features : 5
  num_interleaved_filters : 1
  exit_fifo_depth : 1024
}
Filter scratchpad depth controls the amount of filter buffered.
filter_scratchpad {
  filter_depth : 512
  bias_scale_depth : 512
}
Work with FPGA team to set DMA parameters.
dma  {
  csr_addr_width : 11
  csr_data_bytes : 4
  ddr_addr_width : 32
  ddr_burst_width : 4
  ddr_data_bytes : 64
  ddr_read_id_width : 2
}
Enable activation functions.
activation {
  generic_aux_parameters {
    k_vector : 16
  }
  enable_clamp : true
  enable_leaky_relu : false
  enable_sigmoid : true
  enable_prelu : true
  }

Enable hardened pooling module. If not included, the pooling will be executed on the host.

pool {
  generic_aux_parameters {
    k_vector : 4
  }
  max_window_height : 13
  max_window_width : 13
  max_stride_vertical : 4
  max_stride_horizontal : 4
}
The crossbar acts as a central hub for gathering output from the PE, performs auxiliary operations, and writes back the results for the next convolution. The crossbar has the following main parameters:
xbar_in_ports
Defines a connection that receives output feature from the PE array.
xbar_ports
Defines the connection of several auxiliary modules: activation, hardened pooling, and hardened softmax.

The output feature from the PE array, if needed, are sent to these auxiliary modules for further processing. The activation and softmax modules connect to the xbar_in_port input connection but the pool module connects to the activation connection. These connections means that the output feature can go to activation or softmax directly, but it must go through activation for pooling.

xbar_out_ports
Allows the output feature to be sent to input_feeder then to the PE for the next convolution; to output_writer for writing out the result; since the input_connection has xbar_in_port, the input can bypass all modules connected to the crossbar and directly goes out; similarly, the result from activation, pool, and softmax can be sent out.
xbar {
  xbar_k_vector : 16
  max_input_interfaces : 5
  max_output_interfaces : 5
  xbar_ports {
    xbar_aux_port {
      name : 'activation'
      input_connection : 'xbar_in_port'
    }
    xbar_aux_port {
      name : 'pool'
      input_connection : 'xbar_in_port'
      input_connection : 'activation'
    }
    xbar_aux_port {
      name : 'softmax'
      input_connection : 'xbar_in_port'
    }
  }
  xbar_in_port {
    external_connection : 'pe_array'
  }
  xbar_out_port {
    external_connection : 'input_feeder'
    external_connection : 'output_writer'
    input_connection : 'xbar_in_port'
    input_connection : 'pool'
    input_connection : 'activation'
    input_connection : 'softmax'
  }
}

The configuration network is connected to all other modules since it decodes the instructions from the compiled model to the FPGA device and orchestrates inference by controlling all other modules. As mentioned in FPGA AI Suite IP Datapath Component Organization, the configuration network provides little configurability in the architecture file.