3.1. Consider an FPGA AI Suite Design Example as Starting Point
The FPGA AI Suite provides several design examples that demonstrate how to integrate the FPGA AI Suite IP with real hardware platforms. These examples serve as a foundation to evaluate features, prototype workflows, and understand runtime interactions.
- Do you want to offload inference to the FPGA from a host CPU (look-aside)?
- Do you require standalone (hostless) operation without DDR memory?
- Is your system embedded, with an onboard ARM core (SoC)?
- Are you targeting PCIe* -attached cards, or SoCs with integrated HPS?
- Do you want to use prebuilt bitstreams or build custom ones with Quartus?
PCIe* Host
Use Case: Server or workstation offloading inference to a PCIe* -attached FPGA.
Recommended boards:
- Terasic* DE10-Agilex Development Board (DE10-Agilex-B2E2)
- Open FPGA Stack (OFS)-based boards:
- Agilex™ 5 FPGA E-Series 065B Modular Development Kit (MK-A5E065BB32AES1)
- Agilex™ 7 FPGA I-Series Development Kit ES2 (DK-DEV-AGI027RBES)
- Intel® FPGA SmartNIC N6001-PL Platform (without Ethernet controller)
Key Characteristics:
- Supports look-aside model with host-driven control.
- Integrates with the Intel® Distribution of OpenVINO™ toolkit (x86 host).
- Build scripts support architecture selection and optional bitstream regeneration.
- Designed for performance benchmarking and throughput-optimized inference.
Hostless DDR-Free
Use Case: Fully autonomous AI inference on FPGA, with no host processor and no external DDR memory.
Recommended boards:
- Agilex™ 7 FPGA I-Series Development Kit ES2 (DK-DEV-AGI027RBES)
Key Characteristics:
- Inputs, weights, and configurations stored in on-chip RAM or MIF.
- No external DDR or runtime required.
- Data streaming and results are handled via direct hardware interfaces.
- Suitable for ultra-low-latency and minimal-footprint deployments.
DDR-Free Scenarios
DDR-Free architecture trades on-chip memory blocks with filter data access efficiency. Using DDR-Free architecture is beneficial in the following scenarios:
- When DDR bandwidth bottlenecks the system performance but on-chip memory is sufficient, storing filter and configuration data on-chip lessen the needs for data transfer between the on-chip memory and external memory.
- When the graph is reasonably-sized, its weights and biases fit on-chip memory, and a low latency or a high throughput is critical to your application. In this case, storing them to on-chip memory accelerates the access time and reduces per-layer latency.
- When the multilane feature is enabled, PE arrays consume data more frequently in parallel. Storing filter data on-chip and letting them be shared across lanes result in the least extra resource overhead instead of fetching from external memory.
DDR-Free Constraints
DDR-free mode imposes certain constraints:
- Memory Constraints: The architecture file must have sufficient on-chip memory to accommodate all graph parameters in the filter scratchpad. There must also be sufficient on-chip memory to store all intermediate surfaces in the stream buffer.
Hostless JTAG
Use Case: Direct control of FPGA-based inference through JTAG interface, typically in lab environments or tightly controlled edge systems.
- Agilex™ 3 FPGA C-Series Development Kit (DK-A3Y135BM16AEA)
- Agilex™ 5 FPGA E-Series 065B Modular Development Kit (MK-A5E065BB32AES1)
Key Characteristics:
- External DDR is used for weights/features.
- Host communicates over JTAG to FPGA.
- Good for development, bring-up, or research workflows.
SoC Host
Use Case: Embedded AI inference using the FPGA hard processor system (HPS).
Recommended boards:
- Agilex™ 5 FPGA E-Series 065B Modular Development Kit (MK-A5E065BB32AES1)
- Agilex™ 7 FPGA I-Series Transceiver-SoC Development Kit (DK-SI-AGI027FC)
- Arria® 10 SX SoC FPGA Development Kit (DK-SOC-10AS066S)
Key Characteristics:
- Supports CPU-offload model using OpenVINO™ ARM plugin on Linux.
- Uses Yocto-based builds to generate bootable SD card images.
- Offers two execution modes:
- M2M (Memory-to-Memory): Benchmark-style execution.
- S2M (Streaming-to-Memory): Demonstrates live streaming from CPU to FPGA.
AI Video (SoC Host)
Use Case: Embedded AI inference on video inputs using the FPGA hard processor system (HPS).
Recommended boards:
- Agilex™ 5 FPGA E-Series 065B Modular Development Kit (MK-A5E065BB32AES1)
Key Characteristics:
Custom Platform
Use Case: Support for non-standard or production-specific platforms that are not directly covered by default design examples.
Examples:
- Carrier boards with custom pin maps or I/O.
- PCIe add-in cards with proprietary form factors.
- Edge or embedded systems with bespoke power or memory configurations.
Key Considerations:
- Platform Definition: Use a custom Open FPGA Stack (OFS) Platform Interface Manager (PIM) or modify an existing BSP.
- Interface Integration: Ensure compatibility with chosen I/O (PCIe, JTAG, streaming interfaces).
- Bitstream Generation: You must have a valid FPGA AI Suite license that allows custom compilation.
- Toolchain Compatibility: Align Quartus, BSP, and runtime versions with FPGA AI Suite requirements.
- Software Stack: Modify or extend the OpenVINO™ integration layer (for host-side) or use lightweight inference APIs if hostless.
This structured platform breakdown helps ensure you're targeting the appropriate flow for your system constraints. Once your platform is selected, you can follow the design example’s instructions to build, modify, or integrate the FPGA AI Suite into your deployment pipeline.