2.6.4. Input Feature Tensor In-Memory Format

FPGA AI Suite: IP Reference Manual

Download PDF

ID 768974

Date 11/21/2025

Version

Public

Document Table of Contents

Document Table of Contents x

1. FPGA AI Suite IP Reference Manual 2. About the FPGA AI Suite IP 3. FPGA AI Suite IP Generation Utility 4. FPGA AI Suite Ahead-of-Time Splitter Utility 5. CSR Map and Descriptor Queue A. FPGA AI Suite IP Reference Manual Archives B. FPGA AI Suite IP Reference Manual Document Revision History

2. About the FPGA AI Suite IP x

2.1. Supported Models 2.2. Model Performance 2.3. Software Emulation of the FPGA AI Suite IP 2.4. FPGA AI Suite Layer / Primitive Ranges 2.5. FPGA AI Suite IP Block Configuration 2.6. IP Block Interfaces 2.7. Feature Input and Output Streaming 2.8. DDR-Free Operation

2.1. Supported Models x

2.1.1. MobileNet V2 differences between Caffe and TensorFlow models

2.2. Model Performance x

2.2.1. Throughput on the MobileNetV1 model (and other very fast models) 2.2.2. DDR-Free Streaming Performance

2.5. FPGA AI Suite IP Block Configuration x

2.5.1. Architecture Description File Format for Instance Parameterization 2.5.2. Architecture Description File Parameters

2.5.2. Architecture Description File Parameters x

2.5.2.1. Parameter Group: Global Parameters 2.5.2.2. Parameter Group: activation 2.5.2.3. Parameter Group: pe_array 2.5.2.4. Parameter Group: pool 2.5.2.5. Parameter Group: depthwise 2.5.2.6. Module: softmax 2.5.2.7. Parameter Group: dma 2.5.2.8. Parameter Group: xbar 2.5.2.9. Parameter Group: filter_scratchpad 2.5.2.10. Parameter Group: input_stream_interface 2.5.2.11. Parameter Group: output_stream_interface 2.5.2.12. Parameter Group: config_network 2.5.2.13. Parameter Group: layout_transform_params

2.6. IP Block Interfaces x

2.6.1. Clock and Reset 2.6.2. AXI Interfaces 2.6.3. AXI Interface Clock and Reset 2.6.4. Input Feature Tensor In-Memory Format 2.6.5. Output Tensor In-Memory Format

2.6.4. Input Feature Tensor In-Memory Format x

2.6.4.1. Multiple Input Graphs 2.6.4.2. Input Folding 2.6.4.3. Input Scale and Shift 2.6.4.4. Input Transform Mapping 2.6.4.5. Input Layout Transform Hardware

2.7. Feature Input and Output Streaming x

2.7.1. Input Streaming 2.7.2. Output Streaming

3. FPGA AI Suite IP Generation Utility x

3.1. IP Generation Utility Execution Flows 3.2. IP Generation Utility Inputs 3.3. IP Generation Utility Outputs 3.4. IP Generation Utility Command Line Options

3.4. IP Generation Utility Command Line Options x

3.4.1. The --flow create_ip Flow 3.4.2. The --flow add_arch Flow 3.4.3. The --flow list Flow 3.4.4. The --flow remove_arch Flow

4. FPGA AI Suite Ahead-of-Time Splitter Utility x

4.1. Files Generated by the FPGA AI Suite Ahead-of-Time (AOT) Splitter Utility 4.2. Building the FPGA AI Suite Ahead-of-Time (AOT) Splitter Utility 4.3. Running the FPGA AI Suite Ahead-of-Time (AOT) Splitter Utility 4.4. FPGA AI Suite Ahead-of-Time (AOT) Splitter Utility Example Application

5. CSR Map and Descriptor Queue x

5.1. Discovery ROM 5.2. Interrupt Control 5.3. DMA Descriptor Queue 5.4. DMA Control Registers 5.5. Performance Registers 5.6. Debug Network Registers 5.7. DMA License Register 5.8. DMA Transaction Counters

1. FPGA AI Suite IP Reference Manual

2. About the FPGA AI Suite IP

2.1. Supported Models

2.1.1. MobileNet V2 differences between Caffe and TensorFlow models

2.2. Model Performance

2.2.1. Throughput on the MobileNetV1 model (and other very fast models)

2.2.2. DDR-Free Streaming Performance

2.3. Software Emulation of the FPGA AI Suite IP

2.4. FPGA AI Suite Layer / Primitive Ranges

2.5. FPGA AI Suite IP Block Configuration

2.5.1. Architecture Description File Format for Instance Parameterization

2.5.2. Architecture Description File Parameters

2.5.2.1. Parameter Group: Global Parameters

2.5.2.2. Parameter Group: activation

2.5.2.3. Parameter Group: pe_array

2.5.2.4. Parameter Group: pool

2.5.2.5. Parameter Group: depthwise

2.5.2.6. Module: softmax

2.5.2.7. Parameter Group: dma

2.5.2.8. Parameter Group: xbar

2.5.2.9. Parameter Group: filter_scratchpad

2.5.2.10. Parameter Group: input_stream_interface

2.5.2.11. Parameter Group: output_stream_interface

2.5.2.12. Parameter Group: config_network

2.5.2.13. Parameter Group: layout_transform_params

2.6. IP Block Interfaces

2.6.1. Clock and Reset

2.6.2. AXI Interfaces

2.6.3. AXI Interface Clock and Reset

2.6.4. Input Feature Tensor In-Memory Format

2.6.4.1. Multiple Input Graphs

2.6.4.2. Input Folding

2.6.4.3. Input Scale and Shift

2.6.4.4. Input Transform Mapping

2.6.4.5. Input Layout Transform Hardware

2.6.5. Output Tensor In-Memory Format

2.7. Feature Input and Output Streaming

2.7.1. Input Streaming

2.7.2. Output Streaming

2.8. DDR-Free Operation

3. FPGA AI Suite IP Generation Utility

3.1. IP Generation Utility Execution Flows

3.2. IP Generation Utility Inputs

3.3. IP Generation Utility Outputs

3.4. IP Generation Utility Command Line Options

3.4.1. The --flow create_ip Flow

3.4.2. The --flow add_arch Flow

3.4.3. The --flow list Flow

3.4.4. The --flow remove_arch Flow

4. FPGA AI Suite Ahead-of-Time Splitter Utility

4.1. Files Generated by the FPGA AI Suite Ahead-of-Time (AOT) Splitter Utility

4.2. Building the FPGA AI Suite Ahead-of-Time (AOT) Splitter Utility

4.3. Running the FPGA AI Suite Ahead-of-Time (AOT) Splitter Utility

4.4. FPGA AI Suite Ahead-of-Time (AOT) Splitter Utility Example Application

5. CSR Map and Descriptor Queue

5.1. Discovery ROM

5.2. Interrupt Control

5.3. DMA Descriptor Queue

5.4. DMA Control Registers

5.5. Performance Registers

5.6. Debug Network Registers

5.7. DMA License Register

5.8. DMA Transaction Counters

A. FPGA AI Suite IP Reference Manual Archives

B. FPGA AI Suite IP Reference Manual Document Revision History

2.6.4. Input Feature Tensor In-Memory Format

Input features are stored in FP16 format. FP16 format has 1 sign bit, 10 mantissa bits, and 5 exponent bits. The input features are converted by the IP hardware to its native format using a round to nearest, ties to even (RNE) rounding rule.

Feature elements are packed into CVEC-sized chunks in the channel dimension from low to high. The final CVEC chunk, at a given (d,h,w), is padded with zeros. The CVEC chunks are stored in NCDHW format. The order is as follows: batch, channel, depth, height, width, and CVEC, where CVEC is the fastest changing index and batch the slowest.

The following figure shows a sample memory layout for a 1×3×1×2×2 input tensor to a CVEC=2 architecture:

Figure 3. Input Tensor In-Memory Layout

Section Content
Multiple Input Graphs
Input Folding
Input Scale and Shift
Input Transform Mapping
Input Layout Transform Hardware

Level Two Title

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

FPGA AI Suite: IP Reference Manual

2.6.4. Input Feature Tensor In-Memory Format