9.1. Folding Input
Many graphs, particularly those processing image data, have very shallow input channels. In high CVEC instantiations, very shallow input channels can lead to low computational efficiency.
To improve the computational efficiency on these graphs, you can transform these shallow input channels to a deeper channel vector through a process called input folding.
Folding the input can be done by the host runtime or a soft IP block that you write working in conjunction with the FPGA AI Suite compiler, or it can be performed in hardware using the hardware layout transforms described in Transforming the Layout of Input Data.
Input folding is typically most beneficial to the first layer of a graph.
The following figure illustrates an example of the functionality of the folding transform performed by the FPGA AI Suite compiler.
In this transformation, the input depth, height, and width are folded into the channel dimension by a factor corresponding to the stride of the first convolution of a network. In the earlier figure, this factor corresponds to transforming the input channels from 1 to 4 (), input height from 5 to 3 () and input width from 5 to 3. Each color corresponds to the new filter window, which in this case would be 4×1×2×2, with the gray boxes corresponding to 0 padding for the filters. Folding is done in a similar way for inputs with depths greater than one, but the folding transform illustration excludes it for simplicity.
The FPGA AI Suite IP has various enhancements that reduce, but not eliminate the efficiency hit of shallow first layers. In many cases, you can disable first layer folding in the compiler and pass shallow-channel tensors directly to the IP hardware.
To enable input folding and control how it is applied during inference, set the --ffolding-option compiler option. For more information, refer to --ffolding-option Option in Compilation Options (dla_compiler Command Options).