Intel® FPGA AI Suite: IP Reference Manual

ID 768974
Date 12/01/2023
Public
Document Table of Contents
Give Feedback

2.4.2.3. Parameter Group: pe_array

This parameter group configures the PE Array. The PE Array is used to calculate dot products.

Parameter: pe_array/dsp_limit

Use this parameter to force the PE array to implement multipliers in ALM logic on the FPGA.

The number of multipliers that the PE requires is determined by the k_vector and c_vector global parameters. Given the value of the arch_precision global parameter and the target architecture (for example, Intel® Arria® 10 or Intel Agilex® 7), the number of multipliers determines the number of DSPs that the PE Array tries to use. If this number exceeds the value set in the dsp_limit parameter, then some multipliers are implemented in ALM logic to ensure that the PE Array DSP usage does not exceed the limit set by the dsp_limit parameter.

If this option is omitted, then all multipliers are implemented in the Intel® FPGA AI Suite IP as DSPs.

Typically, this parameter is set by the architecture optimizer.

Parameters: pe_array/num_interleaved_features, pe_array/num_interleaved_filters

The array uses a threaded accumulator that is time-multiplexed to handle multiple accumulations. Each accumulation corresponds to an output filter and feature.

Common Values:
Intel® Arria® 10 devices
4x1, 2x2
Intel Agilex® 5 devices
12x1
Intel Agilex® 7 devices
5x1, 3x2
Intel® Stratix® 10 devices
5x1, 3x2

If you use a 1x1 interleave, then bias is not supported. Because most deep learning graphs use bias, the 1x1 interleave is typically not used.

The architecture optimizer does not modify the num_interleaved_features and num_interleaved_filters values. You must set them manually.

The filter interleave multiplies the effective KVEC, which means that graphs with a depthwise convolution (such as certain versions of MobileNet) might perform best when using num_interleaved_filters=1. Multilayer perceptron graphs might perform best when using num_interleaved_features=1.

Except in the 1x1 case, the value of num_interleaved_features multiplied by num_interleaved_filters must meet the following requirements:
Intel® Arria® 10 devices
The value of num_interleaved_features multiplied by num_interleaved_filters must be greater than or equal to four.
Intel Agilex® 5 devices
The value of num_interleaved_features must be greater than or equal to 12.
Intel Agilex® 7 devices
The value of num_interleaved_features multiplied by num_interleaved_filters must be greater than or equal to five.
Intel® Stratix® 10 devices
The value of num_interleaved_features multiplied by num_interleaved_filters must be greater than or equal to five.

There is no advantage in choosing interleave factors larger than the minimum required.

Parameter: pe_array/exit_fifo_depth

This parameter controls the depth of the PE Array exit FIFO. Larger values might reduce the incidence of stalling, but at the cost of area.

Typically, this parameter is not modified.

Parameter: pe_array/enable_scale

This parameter controls whether the IP supports scaling feature values by a per-channel weight. This is used to support batch normalization.

In most graphs, the graph compiler (dla_compiler command) adjusts the convolution weights to account for scale, so this option is usually not required. (Similarly, if a shift is required, then the convolution bias values are adjusted).

Legal values:
true, false