2.6.4.5. Input Layout Transform Hardware

FPGA AI Suite: IP Reference Manual

Download PDF

ID 768974

Date 11/21/2025

Version

Public

Document Table of Contents

Document Table of Contents x

1. FPGA AI Suite IP Reference Manual 2. About the FPGA AI Suite IP 3. FPGA AI Suite IP Generation Utility 4. FPGA AI Suite Ahead-of-Time Splitter Utility 5. CSR Map and Descriptor Queue A. FPGA AI Suite IP Reference Manual Archives B. FPGA AI Suite IP Reference Manual Document Revision History

2. About the FPGA AI Suite IP x

2.1. Supported Models 2.2. Model Performance 2.3. Software Emulation of the FPGA AI Suite IP 2.4. FPGA AI Suite Layer / Primitive Ranges 2.5. FPGA AI Suite IP Block Configuration 2.6. IP Block Interfaces 2.7. Feature Input and Output Streaming 2.8. DDR-Free Operation

2.1. Supported Models x

2.1.1. MobileNet V2 differences between Caffe and TensorFlow models

2.2. Model Performance x

2.2.1. Throughput on the MobileNetV1 model (and other very fast models) 2.2.2. DDR-Free Streaming Performance

2.5. FPGA AI Suite IP Block Configuration x

2.5.1. Architecture Description File Format for Instance Parameterization 2.5.2. Architecture Description File Parameters

2.5.2. Architecture Description File Parameters x

2.5.2.1. Parameter Group: Global Parameters 2.5.2.2. Parameter Group: activation 2.5.2.3. Parameter Group: pe_array 2.5.2.4. Parameter Group: pool 2.5.2.5. Parameter Group: depthwise 2.5.2.6. Module: softmax 2.5.2.7. Parameter Group: dma 2.5.2.8. Parameter Group: xbar 2.5.2.9. Parameter Group: filter_scratchpad 2.5.2.10. Parameter Group: input_stream_interface 2.5.2.11. Parameter Group: output_stream_interface 2.5.2.12. Parameter Group: config_network 2.5.2.13. Parameter Group: layout_transform_params

2.6. IP Block Interfaces x

2.6.1. Clock and Reset 2.6.2. AXI Interfaces 2.6.3. AXI Interface Clock and Reset 2.6.4. Input Feature Tensor In-Memory Format 2.6.5. Output Tensor In-Memory Format

2.6.4. Input Feature Tensor In-Memory Format x

2.6.4.1. Multiple Input Graphs 2.6.4.2. Input Folding 2.6.4.3. Input Scale and Shift 2.6.4.4. Input Transform Mapping 2.6.4.5. Input Layout Transform Hardware

2.7. Feature Input and Output Streaming x

2.7.1. Input Streaming 2.7.2. Output Streaming

3. FPGA AI Suite IP Generation Utility x

3.1. IP Generation Utility Execution Flows 3.2. IP Generation Utility Inputs 3.3. IP Generation Utility Outputs 3.4. IP Generation Utility Command Line Options

3.4. IP Generation Utility Command Line Options x

3.4.1. The --flow create_ip Flow 3.4.2. The --flow add_arch Flow 3.4.3. The --flow list Flow 3.4.4. The --flow remove_arch Flow

4. FPGA AI Suite Ahead-of-Time Splitter Utility x

4.1. Files Generated by the FPGA AI Suite Ahead-of-Time (AOT) Splitter Utility 4.2. Building the FPGA AI Suite Ahead-of-Time (AOT) Splitter Utility 4.3. Running the FPGA AI Suite Ahead-of-Time (AOT) Splitter Utility 4.4. FPGA AI Suite Ahead-of-Time (AOT) Splitter Utility Example Application

5. CSR Map and Descriptor Queue x

5.1. Discovery ROM 5.2. Interrupt Control 5.3. DMA Descriptor Queue 5.4. DMA Control Registers 5.5. Performance Registers 5.6. Debug Network Registers 5.7. DMA License Register 5.8. DMA Transaction Counters

1. FPGA AI Suite IP Reference Manual

2. About the FPGA AI Suite IP

2.1. Supported Models

2.1.1. MobileNet V2 differences between Caffe and TensorFlow models

2.2. Model Performance

2.2.1. Throughput on the MobileNetV1 model (and other very fast models)

2.2.2. DDR-Free Streaming Performance

2.3. Software Emulation of the FPGA AI Suite IP

2.4. FPGA AI Suite Layer / Primitive Ranges

2.5. FPGA AI Suite IP Block Configuration

2.5.1. Architecture Description File Format for Instance Parameterization

2.5.2. Architecture Description File Parameters

2.5.2.1. Parameter Group: Global Parameters

2.5.2.2. Parameter Group: activation

2.5.2.3. Parameter Group: pe_array

2.5.2.4. Parameter Group: pool

2.5.2.5. Parameter Group: depthwise

2.5.2.6. Module: softmax

2.5.2.7. Parameter Group: dma

2.5.2.8. Parameter Group: xbar

2.5.2.9. Parameter Group: filter_scratchpad

2.5.2.10. Parameter Group: input_stream_interface

2.5.2.11. Parameter Group: output_stream_interface

2.5.2.12. Parameter Group: config_network

2.5.2.13. Parameter Group: layout_transform_params

2.6. IP Block Interfaces

2.6.1. Clock and Reset

2.6.2. AXI Interfaces

2.6.3. AXI Interface Clock and Reset

2.6.4. Input Feature Tensor In-Memory Format

2.6.4.1. Multiple Input Graphs

2.6.4.2. Input Folding

2.6.4.3. Input Scale and Shift

2.6.4.4. Input Transform Mapping

2.6.4.5. Input Layout Transform Hardware

2.6.5. Output Tensor In-Memory Format

2.7. Feature Input and Output Streaming

2.7.1. Input Streaming

2.7.2. Output Streaming

2.8. DDR-Free Operation

3. FPGA AI Suite IP Generation Utility

3.1. IP Generation Utility Execution Flows

3.2. IP Generation Utility Inputs

3.3. IP Generation Utility Outputs

3.4. IP Generation Utility Command Line Options

3.4.1. The --flow create_ip Flow

3.4.2. The --flow add_arch Flow

3.4.3. The --flow list Flow

3.4.4. The --flow remove_arch Flow

4. FPGA AI Suite Ahead-of-Time Splitter Utility

4.1. Files Generated by the FPGA AI Suite Ahead-of-Time (AOT) Splitter Utility

4.2. Building the FPGA AI Suite Ahead-of-Time (AOT) Splitter Utility

4.3. Running the FPGA AI Suite Ahead-of-Time (AOT) Splitter Utility

4.4. FPGA AI Suite Ahead-of-Time (AOT) Splitter Utility Example Application

5. CSR Map and Descriptor Queue

5.1. Discovery ROM

5.2. Interrupt Control

5.3. DMA Descriptor Queue

5.4. DMA Control Registers

5.5. Performance Registers

5.6. Debug Network Registers

5.7. DMA License Register

5.8. DMA Transaction Counters

A. FPGA AI Suite IP Reference Manual Archives

B. FPGA AI Suite IP Reference Manual Document Revision History

2.6.4.5. Input Layout Transform Hardware

The input tensor layout transform and folding operations described in this section can be done on the FPGA AI Suite when the layout transform is enabled in the IP architecture file.

The hardware implementation assumes that the input tensors are in HWC format, and that the data elements are either FP16 or U8 format. The hardware implementation of the input transform supports input folding for any feature, stride, and padding values.

When active, the layout transform hardware folds the input tensor and converts it to the CHWCvec format as described in Input Feature Tensor In-Memory Format. If configured for U8 inputs, the data elements are also converted to FP16 format before tensors are sent downstream for inference. Bias and scale values are applied to the input within the layout transform hardware module if required by the graph.

To avoid input slicing when the hardware layout transform is enabled, size the stream buffer to accommodate the entire input feature.

Use the hardware layout transform with the --ffolding_option 1 compiler option described in "Compilation Options (dla_compiler Command Options)" in the FPGA AI Suite Compiler Reference Manual . The layout transform hardware does not currently support 5-dimensional input tensors.

Level Two Title

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

FPGA AI Suite: IP Reference Manual

2.6.4.5. Input Layout Transform Hardware