9.4.1. Block Floating Point (BFP)

FPGA AI Suite Handbook

Download PDF

ID 863373

Date 11/21/2025

Version 2025.3

Public

Document Table of Contents

Document Table of Contents x

1. FPGA AI Suite Handbook 2. What is the FPGA AI Suite? 3. Design Considerations Before You Begin 4. Installing FPGA AI Suite 5. FPGA AI Suite Tutorial and Design Example Demonstration Applications 6. Creating an Architecture File for the FPGA AI Suite IP 7. Compiling Your Model with the FPGA AI Suite Compiler 8. Generating the FPGA AI Suite IP for Integration into an FPGA Design 9. Optimizing Your FPGA AI Suite IP 10. Integrating FPGA AI Suite IP into an FPGA Design 11. Using FPGA AI Suite as a PCIe* -Attach Platform 12. Using FPGA AI Suite in Hostless DDR-Free Mode 13. Using the FPGA AI Suite in Hostless JTAG Mode 14. Using FPGA AI Suite as an Embedded Platform 15. Using FPGA AI Suite in Video Applications 16. Using the FPGA AI Suite IP with High Bandwidth Memory on Stratix® 10 MX and Agilex™ 7 M-Series Devices 17. Developing Software Applications with the FPGA AI Suite A. FPGA AI Suite Handbook Archives B. FPGA AI Suite Handbook Document Revision History

2. What is the FPGA AI Suite? x

2.1. FPGA AI Suite Components 2.2. The FPGA AI Suite Tool Flow 2.3. The FPGA AI Suite User Flow 2.4. The FPGA AI Suite IP 2.5. The FPGA AI Suite Compiler 2.6. The FPGA AI Suite Design Examples

2.4. The FPGA AI Suite IP x

2.4.1. The FPGA AI Suite IP Overlay Architecture 2.4.2. Supported Models 2.4.3. Model Performance

2.4.1. The FPGA AI Suite IP Overlay Architecture x

2.4.1.1. How Does the FPGA AI Suite Overlay Architecture Work? 2.4.1.2. Parallelism in the FPGA AI Suite IP 2.4.1.3. FPGA AI Suite IP Datapath Component Organization

2.4.1.2. Parallelism in the FPGA AI Suite IP x

2.4.1.2.1. Comparing ML Architectures using the Compiler Report

2.4.2. Supported Models x

2.4.2.1. MobileNet V2 differences between Caffe and TensorFlow models

2.4.3. Model Performance x

2.4.3.1. Throughput on the MobileNetV1 model (and other very fast models) 2.4.3.2. DDR-Free Streaming Performance

2.6. The FPGA AI Suite Design Examples x

2.6.1. PCIe* -Attach Design Example 2.6.2. Open FPGA Stack (OFS) for PCIe* -Attach Design Examples 2.6.3. Hostless DDR-Free Design Example 2.6.4. Hostless JTAG Design Example 2.6.5. SoC Design Example 2.6.6. Design Example Components

2.6.6. Design Example Components x

2.6.6.1. FPGA AI Suite Design Example Utility 2.6.6.2. Example Architecture Bitstream Files 2.6.6.3. Design Example Software Components

2.6.6.1. FPGA AI Suite Design Example Utility x

2.6.6.1.1. The dla_build_example_design Command 2.6.6.1.2. Listing Available FPGA AI Suite Design Examples 2.6.6.1.3. Building FPGA AI Suite Design Examples

2.6.6.1.3. Building FPGA AI Suite Design Examples x

2.6.6.1.3.1. Staging FPGA AI Suite Design Example Builds

2.6.6.3. Design Example Software Components x

2.6.6.3.1. OpenVINO™ FPGA Runtime Overview 2.6.6.3.2. OpenVINO™ FPGA Runtime Plugin 2.6.6.3.3. FPGA AI Suite Runtime 2.6.6.3.4. FPGA AI Suite Custom Platform 2.6.6.3.5. Memory-Mapped Device (MMD) Driver 2.6.6.3.6. FPGA AI Suite Runtime MMD API 2.6.6.3.7. Board Support Package (BSP) Overview 2.6.6.3.8. Additional FPGA AI Suite SoC Design Example Software Components

2.6.6.3.7. Board Support Package (BSP) Overview x

2.6.6.3.7.1. Terasic* DE10-Agilex Development Board BSP Example 2.6.6.3.7.2. Agilex™ 7 PCIe-Attach OFS-based BSP Example

3. Design Considerations Before You Begin x

3.1. Consider an FPGA AI Suite Design Example as Starting Point 3.2. Determine Your Performance Bottlenecks 3.3. Select Your FPGA Device

4. Installing FPGA AI Suite x

4.1. Using the FPGA AI Suite Docker* Image 4.2. Installing the FPGA AI Suite Compiler and IP Generation Tools 4.3. Installing FPGA AI Suite Design Example Prerequisites

4.1. Using the FPGA AI Suite Docker* Image x

4.1.1. FPGA AI Suite Docker* Image Requirements 4.1.2. Running the FPGA AI Suite Docker* Container 4.1.3. Containerized FPGA AI Suite Docker* Image Quick-Start Tutorial

4.2. Installing the FPGA AI Suite Compiler and IP Generation Tools x

4.2.1. Supported FPGA Families 4.2.2. FPGA AI Suite Operating System Prerequisites 4.2.3. Installing FPGA AI Suite 4.2.4. Installing OpenVINO™ Toolkit 4.2.5. Installing Quartus® Prime Pro Edition Software 4.2.6. Setting Required Environment Variables 4.2.7. Finalizing Your FPGA AI Suite Installation 4.2.8. Downloading Precompiled Bitstreams and SD Card Images

4.2.2. FPGA AI Suite Operating System Prerequisites x

4.2.2.1. Red Hat* Enterprise Linux* Operating System Requirements 4.2.2.2. Ubuntu* Operating System Requirements 4.2.2.3. Windows* 11 Operating System Requirements

4.2.3. Installing FPGA AI Suite x

4.2.3.1. FPGA AI Suite Directory Structure 4.2.3.2. Installing the FPGA AI Suite Using the FPGA AI Suite Docker* Image 4.2.3.3. Installing the FPGA AI Suite Using the Quartus® Prime Installer 4.2.3.4. Installing the FPGA AI Suite with System Package Management Tools

4.2.3.4. Installing the FPGA AI Suite with System Package Management Tools x

4.2.3.4.1. Installing the FPGA AI Suite Into an Alternative Location

4.3. Installing FPGA AI Suite Design Example Prerequisites x

4.3.1. Creating a Working Directory 4.3.2. PCIe Design Example Prerequisites 4.3.3. OFS for PCIe* -Attach Design Example Prerequisites 4.3.4. Hostless DDR-Free Design Example Prerequisites 4.3.5. Hostless JTAG Design Example Prerequisites 4.3.6. SoC Design Example Prerequisites

4.3.2. PCIe Design Example Prerequisites x

4.3.2.1. PCIe Design Example Hardware Prerequisites 4.3.2.2. PCIe Design Example Operating System Prerequisites 4.3.2.3. PCIe Design Example Software Prerequisites 4.3.2.4. Building the FPGA AI Suite PCIe Design Example Runtime

4.3.2.3. PCIe Design Example Software Prerequisites x

4.3.2.3.1. Additional Software Prerequisites for the FPGA AI Suite PCIe Design Example for Agilex™ 7 Devices

4.3.2.4. Building the FPGA AI Suite PCIe Design Example Runtime x

4.3.2.4.1. FPGA AI Suite PCIe Design Example CMake Targets 4.3.2.4.2. FPGA AI Suite PCIe Design Example Build Options

4.3.3. OFS for PCIe* -Attach Design Example Prerequisites x

4.3.3.1. Building the FPGA AI Suite OFS for PCIe* Attach Design Example Runtime

4.3.3.1. Building the FPGA AI Suite OFS for PCIe* Attach Design Example Runtime x

4.3.3.1.1. OFS for PCIe* Attach Design Example CMake Targets 4.3.3.1.2. OFS for PCIe* Attach Design Example Build Options

4.3.4. Hostless DDR-Free Design Example Prerequisites x

4.3.4.1. Hostless DDR-Free Design Example Hardware Requirements 4.3.4.2. Hostless DDR-Free Design Example Software Requirements

4.3.5. Hostless JTAG Design Example Prerequisites x

4.3.5.1. Hostless JTAG Design Example Hardware Requirements 4.3.5.2. Hostless JTAG Design Example Software Requirements 4.3.5.3. Building the FPGA AI Suite Hostless JTAG Design Example Runtime

4.3.6. SoC Design Example Prerequisites x

4.3.6.1. SoC Design Example Development Kit Prerequisites 4.3.6.2. Building the Bitstream and Configuring Your Board for the FPGA AI Suite SoC Design Example

4.3.6.1. SoC Design Example Development Kit Prerequisites x

4.3.6.1.1. Agilex™ 7 FPGA I-Series Transceiver-SoC Development Kit SoC Design Example Hardware Requirements

4.3.6.2. Building the Bitstream and Configuring Your Board for the FPGA AI Suite SoC Design Example x

4.3.6.2.1. Initial Setup for the SoC Design Example 4.3.6.2.2. (Optional) Creating an SD Card Image (.wic) 4.3.6.2.3. Writing the SoC Design Example SD Card Image (.wic) to an SD Card 4.3.6.2.4. Preparing SoC FPGA Development Kits for the FPGA AI Suite SoC Design Example 4.3.6.2.5. Adding Compiled Graphs (AOT files) to the SD Card 4.3.6.2.6. Verifying FPGA Device Drivers

4.3.6.2.2. (Optional) Creating an SD Card Image (.wic) x

4.3.6.2.2.1. Installing Prerequisite Software for Building an SD Card Image 4.3.6.2.2.2. Building the FPGA Bitstreams for the SoC Design Example 4.3.6.2.2.3. Installing HPS Disk Image Build Prerequisites 4.3.6.2.2.4. (Optional) Downloading the ImageNet Categories 4.3.6.2.2.5. Building the SD Card Image for the SoC Design Example

4.3.6.2.4. Preparing SoC FPGA Development Kits for the FPGA AI Suite SoC Design Example x

4.3.6.2.4.1. Preparing the Agilex™ 5 FPGA E-Series 065B Modular Development Kit for the SoC Design Example 4.3.6.2.4.2. Preparing the Agilex™ 7 FPGA I-Series Transceiver-SoC Development Kit for the SoC Design Example 4.3.6.2.4.3. Preparing the Arria® 10 SX SoC FPGA Development Kit for the SoC Design Example 4.3.6.2.4.4. Configuring the SoC FPGA Development Kit UART Connection 4.3.6.2.4.5. Determining the SoC FPGA Development Kit IP Address

4.3.6.2.5. Adding Compiled Graphs (AOT files) to the SD Card x

4.3.6.2.5.1. Preparing OpenVINO™ Model Zoo for the SoC Design Example 4.3.6.2.5.2. Preparing a Model for the SoC Design Example 4.3.6.2.5.3. Compiling the Graphs for the SoC Design Example 4.3.6.2.5.4. Copying the Compiled Graphs to the SD card

5. FPGA AI Suite Tutorial and Design Example Demonstration Applications x

5.1. FPGA AI Suite Quick Start Tutorial 5.2. FPGA AI Suite Design Example Demonstration Applications

5.1. FPGA AI Suite Quick Start Tutorial x

5.1.1. Preparing OpenVINO™ Model Zoo for the PCIe Design Example 5.1.2. Preparing a Model for the PCIe Design Example 5.1.3. Running the Graph Compiler 5.1.4. Preparing an Image Set 5.1.5. Programming the FPGA Device 5.1.6. Performing Inference on the PCIe Design Example 5.1.7. Building an FPGA Bitstream for the PCIe Design Example 5.1.8. Building the Example FPGA Bitstreams 5.1.9. Preparing a ResNet50 v1 Model 5.1.10. Performing Inference on the Inflated 3D (I3D) Graph 5.1.11. Performing Inference on YOLOv3 and Calculating Accuracy Metrics 5.1.12. Performing Inference Without an FPGA Board

5.1.11. Performing Inference on YOLOv3 and Calculating Accuracy Metrics x

5.1.11.1. Preparing a YOLOv3 Model 5.1.11.2. Preparing a COCO Validation Dataset and Annotations 5.1.11.3. Inference on YOLOv3 and Calculating Accuracy Scores

5.2. FPGA AI Suite Design Example Demonstration Applications x

5.2.1. Running the PCIe* -Attach Design Example Demonstration Applications 5.2.2. Running the Hostless DDR-Free Design Example 5.2.3. Running the JTAG Design Example Demonstration Application 5.2.4. Running the SoC Design Example Demonstration Applications

5.2.1. Running the PCIe* -Attach Design Example Demonstration Applications x

5.2.1.1. Setup the OFS Environment for the FPGA Device in the OFS for PCIe* -Attach Design Example 5.2.1.2. Exporting Trained Graphs from Source Frameworks in the PCIe* Design Example 5.2.1.3. Compiling Exported Graphs Through the FPGA AI Suite in the PCIe* Design Example 5.2.1.4. Compiling the PCIe* Design Example 5.2.1.5. Programming the FPGA Device in the PCIe* Design Example 5.2.1.6. Performing Accelerated Inference with the dla_benchmark Application in the PCIe* Design Example 5.2.1.7. Running the Ported OpenVINO™ Demonstration Applications in the PCIe* Design Example

5.2.1.6. Performing Accelerated Inference with the dla_benchmark Application in the PCIe* Design Example x

5.2.1.6.1. Inference on Image Classification Graphs in the PCIe* Design Example 5.2.1.6.2. Inference on Object Detection Graphs in the PCIe* Design Example 5.2.1.6.3. Additional dla_benchmark Options in the PCIe* Design Example 5.2.1.6.4. The dla_benchmark Performance Metrics in the PCIe* Design Example

5.2.1.6.2. Inference on Object Detection Graphs in the PCIe* Design Example x

5.2.1.6.2.1. The mAP and COCO AP Metrics in the PCIe* Design Example 5.2.1.6.2.2. Specifying Ground Truth in the PCIe* Design Example 5.2.1.6.2.3. Example of Inference on Object Detection Graphs in the PCIe* Design Example

5.2.1.6.4. The dla_benchmark Performance Metrics in the PCIe* Design Example x

5.2.1.6.4.1. Interpreting System Throughput and Latency Metrics in the PCIe* Design Example

5.2.1.7. Running the Ported OpenVINO™ Demonstration Applications in the PCIe* Design Example x

5.2.1.7.1. Example Running the Object Detection Demonstration Application in the PCIe* Design Example

5.2.3. Running the JTAG Design Example Demonstration Application x

5.2.3.1. Building an FPGA Bitstream for the JTAG Design Examples 5.2.3.2. Programming the FPGA Device with JTAG Design Example Bitstream 5.2.3.3. Preparing JTAG Design Example Graphs for Inference with FPGA AI Suite 5.2.3.4. Performing Inference with the JTAG Design Example 5.2.3.5. JTAG Design Example Inference Performance Measurement 5.2.3.6. JTAG Design Example Known Issues and Limitations

5.2.4. Running the SoC Design Example Demonstration Applications x

5.2.4.1. Running the M2M Mode Demonstration Application in the SoC Design Example 5.2.4.2. Running the S2M Mode Demonstration Application in the SoC Design Example 5.2.4.3. Troubleshooting the Demonstration Applications in the SoC Design Example

6. Creating an Architecture File for the FPGA AI Suite IP x

6.1. Predefined FPGA AI Suite Architecture Files 6.2. FPGA AI Suite Architecture File Breakdown 6.3. FPGA AI Suite IP Parameterization

6.1. Predefined FPGA AI Suite Architecture Files x

6.1.1. Architecture Description File Format for Instance Parameterization

6.2. FPGA AI Suite Architecture File Breakdown x

6.2.1. FPGA AI Suite IP Supported Layers and Hyperparameter Ranges 6.2.2. Architecture Description File Parameters

6.2.2. Architecture Description File Parameters x

6.2.2.1. Parameter Group: Global Parameters 6.2.2.2. Parameter Group: activation 6.2.2.3. Parameter Group: pe_array 6.2.2.4. Parameter Group: pool 6.2.2.5. Parameter Group: depthwise 6.2.2.6. Module: softmax 6.2.2.7. Parameter Group: dma 6.2.2.8. Parameter Group: xbar 6.2.2.9. Parameter Group: filter_scratchpad 6.2.2.10. Parameter Group: input_stream_interface 6.2.2.11. Parameter Group: output_stream_interface 6.2.2.12. Parameter Group: config_network 6.2.2.13. Parameter Group: layout_transform_params 6.2.2.14. Parameter Group: lightweight_layout_transform_params

7. Compiling Your Model with the FPGA AI Suite Compiler x

7.1. Compiling a Graph 7.2. Estimating Graph Performance 7.3. Estimating the Area and Power of an Architecture 7.4. The FPGA AI Suite Compiler Report 7.5. FPGA AI Suite Compiler Command Line Options 7.6. Compiler Inputs and Outputs

7.4. The FPGA AI Suite Compiler Report x

7.4.1. Partitioning Table Report 7.4.2. Model Analyzer Reports 7.4.3. Visualizing How FPGA AI Suite Implements Your Graph

7.4.3. Visualizing How FPGA AI Suite Implements Your Graph x

7.4.3.1. Model Analyzer GUI Description 7.4.3.2. Fixing Unsupported Layers Assigned To CPU

7.5. FPGA AI Suite Compiler Command Line Options x

7.5.1. Inputs (dla_compiler Command Options) 7.5.2. Outputs (dla_compiler Command Options) 7.5.3. Reporting (dla_compiler Command Options) 7.5.4. Compilation Options (dla_compiler Command Options) 7.5.5. Architecture Options (dla_compiler Command Options) 7.5.6. Architecture Optimizer Options (dla_compiler Command Options) 7.5.7. Analyzer Tool Options (dla_compiler Command Options) 7.5.8. Miscellaneous Options (dla_compiler Command Options)

7.6. Compiler Inputs and Outputs x

7.6.1. FPGA AI Suite Input File Formats 7.6.2. FPGA AI Suite Compiler Graph Export Formats

8. Generating the FPGA AI Suite IP for Integration into an FPGA Design x

8.1. IP Generation Utility Execution Flows 8.2. IP Generation Utility Inputs 8.3. IP Generation Utility Outputs 8.4. IP Generation Utility Command Line Options

8.4. IP Generation Utility Command Line Options x

8.4.1. The --flow create_ip Flow 8.4.2. The --flow add_arch Flow 8.4.3. The --flow list Flow 8.4.4. The --flow remove_arch Flow

9. Optimizing Your FPGA AI Suite IP x

9.1. Folding Input 9.2. Parallelizing Inference Using FPGA AI Suite with Multiple Lanes and Multiple Instances 9.3. Transforming Input Data Layout 9.4. Make Precision vs. Performance Trade-offs for Your FPGA AI Suite IP 9.5. FPGA AI Suite IP Supported Layers and Hyperparameter Ranges 9.6. FPGA AI Suite IP Parameterization 9.7. Generating an Optimized Architecture

9.3. Transforming Input Data Layout x

9.3.1. Full Input Layout Transform 9.3.2. Lightweight Layout Transform

9.3.1. Full Input Layout Transform x

9.3.1.1. Input Feature Tensor In-Memory Format 9.3.1.2. Output Tensor In-Memory Format

9.3.1.1. Input Feature Tensor In-Memory Format x

9.3.1.1.1. Multiple Input Graphs 9.3.1.1.2. Input Scale and Shift 9.3.1.1.3. Input Transform Mapping

9.3.2. Lightweight Layout Transform x

9.3.2.1. How the Lightweight Layout Transform Works 9.3.2.2. Enabling the Lightweight Layout Transform

9.4. Make Precision vs. Performance Trade-offs for Your FPGA AI Suite IP x

9.4.1. Block Floating Point (BFP) 9.4.2. Improving Layer Accuracy by using Mixed Precision 9.4.3. Using the Mixed Precision Feature

9.4.3. Using the Mixed Precision Feature x

9.4.3.1. Performance Impact of Mixed Precision

9.7. Generating an Optimized Architecture x

9.7.1. Generating an Architecture for Highest Performance 9.7.2. Generating an Architecture Optimized for a Frame Rate Target Value

10. Integrating FPGA AI Suite IP into an FPGA Design x

10.1. FPGA AI Suite IP Directory Structure 10.2. Interfacing FPGA AI Suite IP an FPGA Design for a Typical System 10.3. FPGA AI Suite IP Interface 10.4. Instantiating the FPGA AI Suite IP in Platform Designer

10.3. FPGA AI Suite IP Interface x

10.3.1. Clock and Reset 10.3.2. AXI Interfaces 10.3.3. AXI Streaming Interface 10.3.4. CSR Map and Descriptor Queue 10.3.5. Interfacing the FPGA AI Suite IP to Avalon® Memory Map (AVMM)

10.3.3. AXI Streaming Interface x

10.3.3.1. Input Streaming 10.3.3.2. Output Streaming

10.4. Instantiating the FPGA AI Suite IP in Platform Designer x

10.4.1. Resource Utilization of FPGA AI Suite IP

11. Using FPGA AI Suite as a PCIe* -Attach Platform x

11.1. PCIe* -Attach Design Example System Architecture 11.2. OFS PCIe* -Attach Design Example Components

11.1. PCIe* -Attach Design Example System Architecture x

11.1.1. PCIe* -Attach System Overview 11.1.2. PCIe* -Attach Design Example Hardware

11.1.2. PCIe* -Attach Design Example Hardware x

11.1.2.1. PCIe* -Attach Design Example PLL Adjustment

11.2. OFS PCIe* -Attach Design Example Components x

11.2.1. OFS PCIe* -Attach Hardware Components 11.2.2. OFS PCIe* -Attach Software Components

12. Using FPGA AI Suite in Hostless DDR-Free Mode x

12.1. Generating Artifacts for Hostless DDR-Free Operation 12.2. Hostless DDR-Free Design Example System Architecture 12.3. Hostless DDR-Free Design Example Quartus® Prime System Console 12.4. Hostless DDR-Free Design Example JTAG to Avalon MM Host Register Map 12.5. Changing the ML Graph in a Hostless DDR-Free Architecture

12.2. Hostless DDR-Free Design Example System Architecture x

12.2.1. Hostless DDR-Free Design Example System Overview 12.2.2. Hostless DDR-Free Design Example Hardware

12.2.2. Hostless DDR-Free Design Example Hardware x

12.2.2.1. The Modular Scatter-Gather DMA (mSGDMA) Engines in the Hostless DDR-Free Design Example 12.2.2.2. On-Chip Memory Modules in the Hostless DDR-Free Design Example 12.2.2.3. Platform Designer System in the Hostless DDR-Free Design Example 12.2.2.4. PLL Adjustment in the Hostless DDR-Free Design Example

12.3. Hostless DDR-Free Design Example Quartus® Prime System Console x

12.3.1. Hostless DDR-Free Design Example Quartus® Prime System Console Script Options 12.3.2. Hostless DDR-Free Design Example Inference Functionality 12.3.3. Hostless DDR-Free Design Example System Reset 12.3.4. Hostless DDR-Free Design Example Input Data Conversion 12.3.5. Measuring Performance in the Hostless DDR-Free Design Example

12.5. Changing the ML Graph in a Hostless DDR-Free Architecture x

12.5.1. Updating Hostless DDR-Free MIF Files Through the CSR

13. Using the FPGA AI Suite in Hostless JTAG Mode x

13.1. Hostless JTAG Design Example Components

13.1. Hostless JTAG Design Example Components x

13.1.1. Hostless JTAG Hardware Components 13.1.2. Hostless JTAG Software Components

14. Using FPGA AI Suite as an Embedded Platform x

14.1. FPGA AI Suite SoC Design Example Inference Sequence Overview 14.2. Memory-to-Memory (M2M) Variant Design 14.3. Streaming-to-Memory (S2M) Variant Design 14.4. Top Level 14.5. The SoC Design Example Platform Designer System 14.6. Fabric EMIF Design Component 14.7. PLL Configuration 14.8. Yocto Build and Runtime Linux Environment 14.9. SoC Design Example MMD Layer Hardware Interaction Library 14.10. FPGA AI Suite SoC Design Example Run Process 14.11. FPGA AI Suite SoC Design Example Build Process

14.2. Memory-to-Memory (M2M) Variant Design x

14.2.1. The mSGDMA FPGA IP 14.2.2. RAM considerations

14.3. Streaming-to-Memory (S2M) Variant Design x

14.3.1. Streaming Enablement for FPGA AI Suite 14.3.2. Nios® V Subsystem 14.3.3. Streaming System Operation 14.3.4. Resolving Input Rate Mismatches Between the FPGA AI Suite IP and the Streaming Input 14.3.5. The Layout Transform IP as an Application-Specific Block in the SoC Design Example

14.3.3. Streaming System Operation x

14.3.3.1. Streaming System Buffer Management 14.3.3.2. Streaming System Inference Job Management

14.3.5. The Layout Transform IP as an Application-Specific Block in the SoC Design Example x

14.3.5.1. Layout Transform Considerations 14.3.5.2. Layout Transform IP Register Map 14.3.5.3. Layout Transform Configuration Options

14.4. Top Level x

14.4.1. Clock Domains

14.5. The SoC Design Example Platform Designer System x

14.5.1. The dla_0 Platform Designer Layer (dla.qsys) 14.5.2. The hps_0 Platform Designer Layer (hps.qys)

14.8. Yocto Build and Runtime Linux Environment x

14.8.1. Yocto Recipe: recipes-core/images/coredla-image.bb 14.8.2. Yocto Recipe: recipes-bsp/u-boot/u-boot-socfpga_%.bbappend 14.8.3. Yocto Recipe: recipes-drivers/msgdma-userio/msgdma-userio.bb 14.8.4. Yocto Recipe: recipes-drivers/uio-devices/uio-devices.bb 14.8.5. Yocto Recipe: recipes-kernel/linux/linux-socfpga-lts_%.bbappend 14.8.6. Yocto Recipe: recipes-support/devmem2/devmem2_2.0.bb 14.8.7. Yocto Recipe: wic

14.9. SoC Design Example MMD Layer Hardware Interaction Library x

14.9.1. MMD Layer Hardware Interaction Library Class mmd_device 14.9.2. MMD Layer Hardware Interaction Library Class uio_device 14.9.3. MMD Layer Hardware Interaction Library Class dma_device

14.10. FPGA AI Suite SoC Design Example Run Process x

14.10.1. Exporting Trained Graphs from Source Frameworks 14.10.2. Compiling Exported Graphs Through the FPGA AI Suite

14.11. FPGA AI Suite SoC Design Example Build Process x

14.11.1. Building the Quartus® Prime Project 14.11.2. Building the Bootable SD Card Image (.wic)

14.11.1. Building the Quartus® Prime Project x

14.11.1.1. Quartus® Prime Build Flow 14.11.1.2. Build Script Options 14.11.1.3. Build Directory

14.11.1.1. Quartus® Prime Build Flow x

14.11.1.1.1. Build Synchronization of FPGA with Software

14.11.1.3. Build Directory x

14.11.1.3.1. The build_stream_controller.sh Script

15. Using FPGA AI Suite in Video Applications x

15.1. Nios® Subsystem 15.2. Building the Stream Controller Module 15.3. Building the Streaming Demonstration Applications 15.4. Running the Streaming Demonstration

15.1. Nios® Subsystem x

15.1.1. Stream Controller Communication Protocol 15.1.2. Buffer Flow in Streaming Mode using Nios® V Software Scheduler

15.1.2. Buffer Flow in Streaming Mode using Nios® V Software Scheduler x

15.1.2.1. Review of M2M mode 15.1.2.2. External Streaming Mode Buffer Flow 15.1.2.3. Nios® V Stream Controller State Machine Buffer Flow

15.4. Running the Streaming Demonstration x

15.4.1. The streaming_inference_app Application 15.4.2. The image_streaming_app Application

16. Using the FPGA AI Suite IP with High Bandwidth Memory on Stratix® 10 MX and Agilex™ 7 M-Series Devices x

16.1. Current FPGA AI Suite IP Implementation With DDR 16.2. HBM Connection Considerations

16.2. HBM Connection Considerations x

16.2.1. Memory Size and Bandwidth Considerations 16.2.2. Memory Data Width Considerations 16.2.3. Conclusion

17. Developing Software Applications with the FPGA AI Suite x

17.1. Understanding the FPGA AI Suite Runtime Software Stack 17.2. Using the FPGA AI Suite Software Emulation 17.3. Running Inference on the FPGA AI Suite IP Without the OpenVINO™ Runtime

17.3. Running Inference on the FPGA AI Suite IP Without the OpenVINO™ Runtime x

17.3.1. Files Generated by the FPGA AI Suite Ahead-of-Time (AOT) Splitter Utility 17.3.2. Building the FPGA AI Suite Ahead-of-Time (AOT) Splitter Utility 17.3.3. Running the FPGA AI Suite Ahead-of-Time (AOT) Splitter Utility 17.3.4. FPGA AI Suite Ahead-of-Time (AOT) Splitter Utility Example Application

B. FPGA AI Suite Handbook Document Revision History x

B.1. FPGA AI Suite Getting Started Guide Document Revision History B.2. FPGA AI Suite Compiler Reference Manual Document Revision History B.3. FPGA AI Suite IP Reference Manual Document Revision History B.4. FPGA AI Suite Example Designs User Guide Revision History B.5. AN 1008: Using the FPGA AI Suite Docker* Image Document Revision History B.6. AN 1020: Using the FPGA AI Suite IP with High Bandwidth Memory on Stratix® 10 MX and Agilex™ 7 M-Series Devices Document Revision History

B.4. FPGA AI Suite Example Designs User Guide Revision History x

B.4.1. FPGA AI Suite PCIe-based Design Example User Guide Document Revision History B.4.2. FPGA AI Suite SoC Design Example User Guide Document Revision History

1. FPGA AI Suite Handbook

2. What is the FPGA AI Suite?

2.1. FPGA AI Suite Components

2.2. The FPGA AI Suite Tool Flow

2.3. The FPGA AI Suite User Flow

2.4. The FPGA AI Suite IP

2.4.1. The FPGA AI Suite IP Overlay Architecture

2.4.1.1. How Does the FPGA AI Suite Overlay Architecture Work?

2.4.1.2. Parallelism in the FPGA AI Suite IP

2.4.1.2.1. Comparing ML Architectures using the Compiler Report

2.4.1.3. FPGA AI Suite IP Datapath Component Organization

2.4.2. Supported Models

2.4.2.1. MobileNet V2 differences between Caffe and TensorFlow models

2.4.3. Model Performance

2.4.3.1. Throughput on the MobileNetV1 model (and other very fast models)

2.4.3.2. DDR-Free Streaming Performance

2.5. The FPGA AI Suite Compiler

2.6. The FPGA AI Suite Design Examples

2.6.1. PCIe* -Attach Design Example

2.6.2. Open FPGA Stack (OFS) for PCIe* -Attach Design Examples

2.6.3. Hostless DDR-Free Design Example

2.6.4. Hostless JTAG Design Example

2.6.5. SoC Design Example

2.6.6. Design Example Components

2.6.6.1. FPGA AI Suite Design Example Utility

2.6.6.1.1. The dla_build_example_design Command

2.6.6.1.2. Listing Available FPGA AI Suite Design Examples

2.6.6.1.3. Building FPGA AI Suite Design Examples

2.6.6.1.3.1. Staging FPGA AI Suite Design Example Builds

2.6.6.2. Example Architecture Bitstream Files

2.6.6.3. Design Example Software Components

2.6.6.3.1. OpenVINO™ FPGA Runtime Overview

2.6.6.3.2. OpenVINO™ FPGA Runtime Plugin

2.6.6.3.3. FPGA AI Suite Runtime

2.6.6.3.4. FPGA AI Suite Custom Platform

2.6.6.3.5. Memory-Mapped Device (MMD) Driver

2.6.6.3.6. FPGA AI Suite Runtime MMD API

2.6.6.3.7. Board Support Package (BSP) Overview

2.6.6.3.7.1. Terasic* DE10-Agilex Development Board BSP Example

2.6.6.3.7.2. Agilex™ 7 PCIe-Attach OFS-based BSP Example

2.6.6.3.8. Additional FPGA AI Suite SoC Design Example Software Components

3. Design Considerations Before You Begin

3.1. Consider an FPGA AI Suite Design Example as Starting Point

3.2. Determine Your Performance Bottlenecks

3.3. Select Your FPGA Device

4. Installing FPGA AI Suite

4.1. Using the FPGA AI Suite Docker* Image

4.1.1. FPGA AI Suite Docker* Image Requirements

4.1.2. Running the FPGA AI Suite Docker* Container

4.1.3. Containerized FPGA AI Suite Docker* Image Quick-Start Tutorial

4.2. Installing the FPGA AI Suite Compiler and IP Generation Tools

4.2.1. Supported FPGA Families

4.2.2. FPGA AI Suite Operating System Prerequisites

4.2.2.1. Red Hat* Enterprise Linux* Operating System Requirements

4.2.2.2. Ubuntu* Operating System Requirements

4.2.2.3. Windows* 11 Operating System Requirements

4.2.3. Installing FPGA AI Suite

4.2.3.1. FPGA AI Suite Directory Structure

4.2.3.2. Installing the FPGA AI Suite Using the FPGA AI Suite Docker* Image

4.2.3.3. Installing the FPGA AI Suite Using the Quartus® Prime Installer

4.2.3.4. Installing the FPGA AI Suite with System Package Management Tools

4.2.3.4.1. Installing the FPGA AI Suite Into an Alternative Location

4.2.4. Installing OpenVINO™ Toolkit

4.2.5. Installing Quartus® Prime Pro Edition Software

4.2.6. Setting Required Environment Variables

4.2.7. Finalizing Your FPGA AI Suite Installation

4.2.8. Downloading Precompiled Bitstreams and SD Card Images

4.3. Installing FPGA AI Suite Design Example Prerequisites

4.3.1. Creating a Working Directory

4.3.2. PCIe Design Example Prerequisites

4.3.2.1. PCIe Design Example Hardware Prerequisites

4.3.2.2. PCIe Design Example Operating System Prerequisites

4.3.2.3. PCIe Design Example Software Prerequisites

4.3.2.3.1. Additional Software Prerequisites for the FPGA AI Suite PCIe Design Example for Agilex™ 7 Devices

4.3.2.4. Building the FPGA AI Suite PCIe Design Example Runtime

4.3.2.4.1. FPGA AI Suite PCIe Design Example CMake Targets

4.3.2.4.2. FPGA AI Suite PCIe Design Example Build Options

4.3.3. OFS for PCIe* -Attach Design Example Prerequisites

4.3.3.1. Building the FPGA AI Suite OFS for PCIe* Attach Design Example Runtime

4.3.3.1.1. OFS for PCIe* Attach Design Example CMake Targets

4.3.3.1.2. OFS for PCIe* Attach Design Example Build Options

4.3.4. Hostless DDR-Free Design Example Prerequisites

4.3.4.1. Hostless DDR-Free Design Example Hardware Requirements

4.3.4.2. Hostless DDR-Free Design Example Software Requirements

4.3.5. Hostless JTAG Design Example Prerequisites

4.3.5.1. Hostless JTAG Design Example Hardware Requirements

4.3.5.2. Hostless JTAG Design Example Software Requirements

4.3.5.3. Building the FPGA AI Suite Hostless JTAG Design Example Runtime

4.3.6. SoC Design Example Prerequisites

4.3.6.1. SoC Design Example Development Kit Prerequisites

4.3.6.1.1. Agilex™ 7 FPGA I-Series Transceiver-SoC Development Kit SoC Design Example Hardware Requirements

4.3.6.2. Building the Bitstream and Configuring Your Board for the FPGA AI Suite SoC Design Example

4.3.6.2.1. Initial Setup for the SoC Design Example

4.3.6.2.2. (Optional) Creating an SD Card Image (.wic)

4.3.6.2.2.1. Installing Prerequisite Software for Building an SD Card Image

4.3.6.2.2.2. Building the FPGA Bitstreams for the SoC Design Example

4.3.6.2.2.3. Installing HPS Disk Image Build Prerequisites

4.3.6.2.2.4. (Optional) Downloading the ImageNet Categories

4.3.6.2.2.5. Building the SD Card Image for the SoC Design Example

4.3.6.2.3. Writing the SoC Design Example SD Card Image (.wic) to an SD Card

4.3.6.2.4. Preparing SoC FPGA Development Kits for the FPGA AI Suite SoC Design Example

4.3.6.2.4.1. Preparing the Agilex™ 5 FPGA E-Series 065B Modular Development Kit for the SoC Design Example

4.3.6.2.4.2. Preparing the Agilex™ 7 FPGA I-Series Transceiver-SoC Development Kit for the SoC Design Example

4.3.6.2.4.3. Preparing the Arria® 10 SX SoC FPGA Development Kit for the SoC Design Example

4.3.6.2.4.4. Configuring the SoC FPGA Development Kit UART Connection

4.3.6.2.4.5. Determining the SoC FPGA Development Kit IP Address

4.3.6.2.5. Adding Compiled Graphs (AOT files) to the SD Card

4.3.6.2.5.1. Preparing OpenVINO™ Model Zoo for the SoC Design Example

4.3.6.2.5.2. Preparing a Model for the SoC Design Example

4.3.6.2.5.3. Compiling the Graphs for the SoC Design Example

4.3.6.2.5.4. Copying the Compiled Graphs to the SD card

4.3.6.2.6. Verifying FPGA Device Drivers

5. FPGA AI Suite Tutorial and Design Example Demonstration Applications

5.1. FPGA AI Suite Quick Start Tutorial

5.1.1. Preparing OpenVINO™ Model Zoo for the PCIe Design Example

5.1.2. Preparing a Model for the PCIe Design Example

5.1.3. Running the Graph Compiler

5.1.4. Preparing an Image Set

5.1.5. Programming the FPGA Device

5.1.6. Performing Inference on the PCIe Design Example

5.1.7. Building an FPGA Bitstream for the PCIe Design Example

5.1.8. Building the Example FPGA Bitstreams

5.1.9. Preparing a ResNet50 v1 Model

5.1.10. Performing Inference on the Inflated 3D (I3D) Graph

5.1.11. Performing Inference on YOLOv3 and Calculating Accuracy Metrics

5.1.11.1. Preparing a YOLOv3 Model

5.1.11.2. Preparing a COCO Validation Dataset and Annotations

5.1.11.3. Inference on YOLOv3 and Calculating Accuracy Scores

5.1.12. Performing Inference Without an FPGA Board

5.2. FPGA AI Suite Design Example Demonstration Applications

5.2.1. Running the PCIe* -Attach Design Example Demonstration Applications

5.2.1.1. Setup the OFS Environment for the FPGA Device in the OFS for PCIe* -Attach Design Example

5.2.1.2. Exporting Trained Graphs from Source Frameworks in the PCIe* Design Example

5.2.1.3. Compiling Exported Graphs Through the FPGA AI Suite in the PCIe* Design Example

5.2.1.4. Compiling the PCIe* Design Example

5.2.1.5. Programming the FPGA Device in the PCIe* Design Example

5.2.1.6. Performing Accelerated Inference with the dla_benchmark Application in the PCIe* Design Example

5.2.1.6.1. Inference on Image Classification Graphs in the PCIe* Design Example

5.2.1.6.2. Inference on Object Detection Graphs in the PCIe* Design Example

5.2.1.6.2.1. The mAP and COCO AP Metrics in the PCIe* Design Example

5.2.1.6.2.2. Specifying Ground Truth in the PCIe* Design Example

5.2.1.6.2.3. Example of Inference on Object Detection Graphs in the PCIe* Design Example

5.2.1.6.3. Additional dla_benchmark Options in the PCIe* Design Example

5.2.1.6.4. The dla_benchmark Performance Metrics in the PCIe* Design Example

5.2.1.6.4.1. Interpreting System Throughput and Latency Metrics in the PCIe* Design Example

5.2.1.7. Running the Ported OpenVINO™ Demonstration Applications in the PCIe* Design Example

5.2.1.7.1. Example Running the Object Detection Demonstration Application in the PCIe* Design Example

5.2.2. Running the Hostless DDR-Free Design Example

5.2.3. Running the JTAG Design Example Demonstration Application

5.2.3.1. Building an FPGA Bitstream for the JTAG Design Examples

5.2.3.2. Programming the FPGA Device with JTAG Design Example Bitstream

5.2.3.3. Preparing JTAG Design Example Graphs for Inference with FPGA AI Suite

5.2.3.4. Performing Inference with the JTAG Design Example

5.2.3.5. JTAG Design Example Inference Performance Measurement

5.2.3.6. JTAG Design Example Known Issues and Limitations

5.2.4. Running the SoC Design Example Demonstration Applications

5.2.4.1. Running the M2M Mode Demonstration Application in the SoC Design Example

5.2.4.2. Running the S2M Mode Demonstration Application in the SoC Design Example

5.2.4.3. Troubleshooting the Demonstration Applications in the SoC Design Example

6. Creating an Architecture File for the FPGA AI Suite IP

6.1. Predefined FPGA AI Suite Architecture Files

6.1.1. Architecture Description File Format for Instance Parameterization

6.2. FPGA AI Suite Architecture File Breakdown

6.2.1. FPGA AI Suite IP Supported Layers and Hyperparameter Ranges

6.2.2. Architecture Description File Parameters

6.2.2.1. Parameter Group: Global Parameters

6.2.2.2. Parameter Group: activation

6.2.2.3. Parameter Group: pe_array

6.2.2.4. Parameter Group: pool

6.2.2.5. Parameter Group: depthwise

6.2.2.6. Module: softmax

6.2.2.7. Parameter Group: dma

6.2.2.8. Parameter Group: xbar

6.2.2.9. Parameter Group: filter_scratchpad

6.2.2.10. Parameter Group: input_stream_interface

6.2.2.11. Parameter Group: output_stream_interface

6.2.2.12. Parameter Group: config_network

6.2.2.13. Parameter Group: layout_transform_params

6.2.2.14. Parameter Group: lightweight_layout_transform_params

6.3. FPGA AI Suite IP Parameterization

7. Compiling Your Model with the FPGA AI Suite Compiler

7.1. Compiling a Graph

7.2. Estimating Graph Performance

7.3. Estimating the Area and Power of an Architecture

7.4. The FPGA AI Suite Compiler Report

7.4.1. Partitioning Table Report

7.4.2. Model Analyzer Reports

7.4.3. Visualizing How FPGA AI Suite Implements Your Graph

7.4.3.1. Model Analyzer GUI Description

7.4.3.2. Fixing Unsupported Layers Assigned To CPU

7.5. FPGA AI Suite Compiler Command Line Options

7.5.1. Inputs (dla_compiler Command Options)

7.5.2. Outputs (dla_compiler Command Options)

7.5.3. Reporting (dla_compiler Command Options)

7.5.4. Compilation Options (dla_compiler Command Options)

7.5.5. Architecture Options (dla_compiler Command Options)

7.5.6. Architecture Optimizer Options (dla_compiler Command Options)

7.5.7. Analyzer Tool Options (dla_compiler Command Options)

7.5.8. Miscellaneous Options (dla_compiler Command Options)

7.6. Compiler Inputs and Outputs

7.6.1. FPGA AI Suite Input File Formats

7.6.2. FPGA AI Suite Compiler Graph Export Formats

8. Generating the FPGA AI Suite IP for Integration into an FPGA Design

8.1. IP Generation Utility Execution Flows

8.2. IP Generation Utility Inputs

8.3. IP Generation Utility Outputs

8.4. IP Generation Utility Command Line Options

8.4.1. The --flow create_ip Flow

8.4.2. The --flow add_arch Flow

8.4.3. The --flow list Flow

8.4.4. The --flow remove_arch Flow

9. Optimizing Your FPGA AI Suite IP

9.1. Folding Input

9.2. Parallelizing Inference Using FPGA AI Suite with Multiple Lanes and Multiple Instances

9.3. Transforming Input Data Layout

9.3.1. Full Input Layout Transform

9.3.1.1. Input Feature Tensor In-Memory Format

9.3.1.1.1. Multiple Input Graphs

9.3.1.1.2. Input Scale and Shift

9.3.1.1.3. Input Transform Mapping

9.3.1.2. Output Tensor In-Memory Format

9.3.2. Lightweight Layout Transform

9.3.2.1. How the Lightweight Layout Transform Works

9.3.2.2. Enabling the Lightweight Layout Transform

9.4. Make Precision vs. Performance Trade-offs for Your FPGA AI Suite IP

9.4.1. Block Floating Point (BFP)

9.4.2. Improving Layer Accuracy by using Mixed Precision

9.4.3. Using the Mixed Precision Feature

9.4.3.1. Performance Impact of Mixed Precision

9.5. FPGA AI Suite IP Supported Layers and Hyperparameter Ranges

9.6. FPGA AI Suite IP Parameterization

9.7. Generating an Optimized Architecture

9.7.1. Generating an Architecture for Highest Performance

9.7.2. Generating an Architecture Optimized for a Frame Rate Target Value

10. Integrating FPGA AI Suite IP into an FPGA Design

10.1. FPGA AI Suite IP Directory Structure

10.2. Interfacing FPGA AI Suite IP an FPGA Design for a Typical System

10.3. FPGA AI Suite IP Interface

10.3.1. Clock and Reset

10.3.2. AXI Interfaces

10.3.3. AXI Streaming Interface

10.3.3.1. Input Streaming

10.3.3.2. Output Streaming

10.3.4. CSR Map and Descriptor Queue

10.3.5. Interfacing the FPGA AI Suite IP to Avalon® Memory Map (AVMM)

10.4. Instantiating the FPGA AI Suite IP in Platform Designer

10.4.1. Resource Utilization of FPGA AI Suite IP

11. Using FPGA AI Suite as a PCIe* -Attach Platform

11.1. PCIe* -Attach Design Example System Architecture

11.1.1. PCIe* -Attach System Overview

11.1.2. PCIe* -Attach Design Example Hardware

11.1.2.1. PCIe* -Attach Design Example PLL Adjustment

11.2. OFS PCIe* -Attach Design Example Components

11.2.1. OFS PCIe* -Attach Hardware Components

11.2.2. OFS PCIe* -Attach Software Components

12. Using FPGA AI Suite in Hostless DDR-Free Mode

12.1. Generating Artifacts for Hostless DDR-Free Operation

12.2. Hostless DDR-Free Design Example System Architecture

12.2.1. Hostless DDR-Free Design Example System Overview

12.2.2. Hostless DDR-Free Design Example Hardware

12.2.2.1. The Modular Scatter-Gather DMA (mSGDMA) Engines in the Hostless DDR-Free Design Example

12.2.2.2. On-Chip Memory Modules in the Hostless DDR-Free Design Example

12.2.2.3. Platform Designer System in the Hostless DDR-Free Design Example

12.2.2.4. PLL Adjustment in the Hostless DDR-Free Design Example

12.3. Hostless DDR-Free Design Example Quartus® Prime System Console

12.3.1. Hostless DDR-Free Design Example Quartus® Prime System Console Script Options

12.3.2. Hostless DDR-Free Design Example Inference Functionality

12.3.3. Hostless DDR-Free Design Example System Reset

12.3.4. Hostless DDR-Free Design Example Input Data Conversion

12.3.5. Measuring Performance in the Hostless DDR-Free Design Example

12.4. Hostless DDR-Free Design Example JTAG to Avalon MM Host Register Map

12.5. Changing the ML Graph in a Hostless DDR-Free Architecture

12.5.1. Updating Hostless DDR-Free MIF Files Through the CSR

13. Using the FPGA AI Suite in Hostless JTAG Mode

13.1. Hostless JTAG Design Example Components

13.1.1. Hostless JTAG Hardware Components

13.1.2. Hostless JTAG Software Components

14. Using FPGA AI Suite as an Embedded Platform

14.1. FPGA AI Suite SoC Design Example Inference Sequence Overview

14.2. Memory-to-Memory (M2M) Variant Design

14.2.1. The mSGDMA FPGA IP

14.2.2. RAM considerations

14.3. Streaming-to-Memory (S2M) Variant Design

14.3.1. Streaming Enablement for FPGA AI Suite

14.3.2. Nios® V Subsystem

14.3.3. Streaming System Operation

14.3.3.1. Streaming System Buffer Management

14.3.3.2. Streaming System Inference Job Management

14.3.4. Resolving Input Rate Mismatches Between the FPGA AI Suite IP and the Streaming Input

14.3.5. The Layout Transform IP as an Application-Specific Block in the SoC Design Example

14.3.5.1. Layout Transform Considerations

14.3.5.2. Layout Transform IP Register Map

14.3.5.3. Layout Transform Configuration Options

14.4. Top Level

14.4.1. Clock Domains

14.5. The SoC Design Example Platform Designer System

14.5.1. The dla_0 Platform Designer Layer (dla.qsys)

14.5.2. The hps_0 Platform Designer Layer (hps.qys)

14.6. Fabric EMIF Design Component

14.7. PLL Configuration

14.8. Yocto Build and Runtime Linux Environment

14.8.1. Yocto Recipe: recipes-core/images/coredla-image.bb

14.8.2. Yocto Recipe: recipes-bsp/u-boot/u-boot-socfpga_%.bbappend

14.8.3. Yocto Recipe: recipes-drivers/msgdma-userio/msgdma-userio.bb

14.8.4. Yocto Recipe: recipes-drivers/uio-devices/uio-devices.bb

14.8.5. Yocto Recipe: recipes-kernel/linux/linux-socfpga-lts_%.bbappend

14.8.6. Yocto Recipe: recipes-support/devmem2/devmem2_2.0.bb

14.8.7. Yocto Recipe: wic

14.9. SoC Design Example MMD Layer Hardware Interaction Library

14.9.1. MMD Layer Hardware Interaction Library Class mmd_device

14.9.2. MMD Layer Hardware Interaction Library Class uio_device

14.9.3. MMD Layer Hardware Interaction Library Class dma_device

14.10. FPGA AI Suite SoC Design Example Run Process

14.10.1. Exporting Trained Graphs from Source Frameworks

14.10.2. Compiling Exported Graphs Through the FPGA AI Suite

14.11. FPGA AI Suite SoC Design Example Build Process

14.11.1. Building the Quartus® Prime Project

14.11.1.1. Quartus® Prime Build Flow

14.11.1.1.1. Build Synchronization of FPGA with Software

14.11.1.2. Build Script Options

14.11.1.3. Build Directory

14.11.1.3.1. The build_stream_controller.sh Script

14.11.2. Building the Bootable SD Card Image (.wic)

15. Using FPGA AI Suite in Video Applications

15.1. Nios® Subsystem

15.1.1. Stream Controller Communication Protocol

15.1.2. Buffer Flow in Streaming Mode using Nios® V Software Scheduler

15.1.2.1. Review of M2M mode

15.1.2.2. External Streaming Mode Buffer Flow

15.1.2.3. Nios® V Stream Controller State Machine Buffer Flow

15.2. Building the Stream Controller Module

15.3. Building the Streaming Demonstration Applications

15.4. Running the Streaming Demonstration

15.4.1. The streaming_inference_app Application

15.4.2. The image_streaming_app Application

16. Using the FPGA AI Suite IP with High Bandwidth Memory on Stratix® 10 MX and Agilex™ 7 M-Series Devices

16.1. Current FPGA AI Suite IP Implementation With DDR

16.2. HBM Connection Considerations

16.2.1. Memory Size and Bandwidth Considerations

16.2.2. Memory Data Width Considerations

16.2.3. Conclusion

17. Developing Software Applications with the FPGA AI Suite

17.1. Understanding the FPGA AI Suite Runtime Software Stack

17.2. Using the FPGA AI Suite Software Emulation

17.3. Running Inference on the FPGA AI Suite IP Without the OpenVINO™ Runtime

17.3.1. Files Generated by the FPGA AI Suite Ahead-of-Time (AOT) Splitter Utility

17.3.2. Building the FPGA AI Suite Ahead-of-Time (AOT) Splitter Utility

17.3.3. Running the FPGA AI Suite Ahead-of-Time (AOT) Splitter Utility

17.3.4. FPGA AI Suite Ahead-of-Time (AOT) Splitter Utility Example Application

A. FPGA AI Suite Handbook Archives

B. FPGA AI Suite Handbook Document Revision History

B.1. FPGA AI Suite Getting Started Guide Document Revision History

B.2. FPGA AI Suite Compiler Reference Manual Document Revision History

B.3. FPGA AI Suite IP Reference Manual Document Revision History

B.4. FPGA AI Suite Example Designs User Guide Revision History

B.4.1. FPGA AI Suite PCIe-based Design Example User Guide Document Revision History

B.4.2. FPGA AI Suite SoC Design Example User Guide Document Revision History

B.5. AN 1008: Using the FPGA AI Suite Docker* Image Document Revision History

B.6. AN 1020: Using the FPGA AI Suite IP with High Bandwidth Memory on Stratix® 10 MX and Agilex™ 7 M-Series Devices Document Revision History

9.4.1. Block Floating Point (BFP)

The following diagram illustrates the conversion from floating point (fp16) to block floating point (in this example, INT9-BFP). The mantissas of the inputs are aligned to the largest exponent value in the group, which becomes the shared exponent.

Figure 24. Conversion to block floating point

The Architecture Description File defines the block size parameter (c_vector) and block floating point precision parameter (arch_precision).

The following table below summarizes the block floating point notation used by the FPGA AI Suite. The mantissa widths given here include the implicit leading 1.

Table 27. Block Floating Point Notation Convention
arch_precision	Block floating point	Meaning
FP11	INT7-BFP	1s.6m.5e (unsigned integer mantissa)
FP12AGX	INT8-BFP	8m.5e (two’s complement mantissa)
FP13AGX	INT9-BFP	9m.5e (two’s complement mantissa)
FP16	INT12-BFP	1s.11m.5e (unsigned integer mantissa)

Due to the architecture of its DSP blocks, the Agilex™ 7 FPGA fabric is optimally configured at INT9-BFP. The Agilex™ 5 FPGA fabrics are optimally configured in DSP tensor mode at INT8-BFP.

When converting from floating point to block floating point, numerical precision is lost in the following ways:

If input values in the block have different exponents, those values with smaller exponents lose lowermost precision bits through the shift-align-round operation.
If the BFP mantissa format has fewer bits of precision than the input floating point format, lowermost bits of precision are lost regardless of the exponent value.

Numerical experiments have demonstrated that the quantization to lower-precision block floating point formats results in relatively small accuracy loss for many popular networks. For details, refer to the Intel white paper " Low-Precision Networks for Efficient Inference on FPGAs ".

Level Two Title

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

FPGA AI Suite Handbook

9.4.1. Block Floating Point (BFP)