L-tile and H-tile Avalon® Memory-mapped Intel® FPGA IP for PCI Express* User Guide

ID 683667
Date 9/26/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

10.2. Endpoint Design Example

This design example comprises a native Endpoint, a DMA application and a Root Port BFM. The write DMA module implements write operations from the Endpoint memory to the Root Complex (RC) memory. The read DMA implements read operations from the RC memory to the Endpoint memory.

When operating on a hardware platform, a software application running on the Root Complex processor typically controls the DMA. In simulation, the generated testbench, along with this design example, provide a BFM driver module in Verilog HDL that controls the DMA operations. Because the example relies on no other hardware interface than the PCI Express link, you can use the design example for the initial hardware validation of your system.

System generation creates the Endpoint variant in Verilog HDL. The testbench files are only available in Verilog HDL in the current release.

Note:

To run the DMA tests using MSI, you must set the Number of MSI messages requested parameter under the PCI Express/PCI Capabilities page to at least 2.

The DMA design example uses an architecture capable of transferring a large amount of fragmented memory without accessing the DMA registers for every memory block. For each memory block, the DMA design example uses a descriptor table containing the following information:

  • Size of the transfer
  • Address of the source
  • Address of the destination
  • Control bits to set the handshaking behavior between the software application or BFM driver and the DMA module
Note: The DMA design example only supports DWORD‑aligned accesses. The DMA design example does not support ECRC forwarding.

The BFM driver writes the descriptor tables into BFM shared memory, from which the DMA design engine continuously collects the descriptor tables for DMA read, DMA write, or both. At the beginning of the transfer, the BFM programs the Endpoint DMA control register. The DMA control register indicates the total number of descriptor tables and the BFM shared memory address of the first descriptor table. After programming the DMA control register, the DMA engine continuously fetches descriptors from the BFM shared memory for both DMA reads and DMA writes, and then performs the data transfer for each descriptor.

The following figure shows a block diagram of the design example connected to an external RC CPU.

Figure 69. Top-Level DMA Example for Simulation

The block diagram contains the following elements:

  • The DMA application connects to the Avalon® -MM interface of the Intel L-/H-Tile Avalon-MM for PCI ExpressIP core. The connections consist of the following interfaces:
    • The Avalon® -MM RX master receives TLP header and data information from the Hard IP block.
    • The Avalon® -MM TX slave transmits TLP header and data information to the Hard IP block.
    • The Avalon® -MM control register access (CRA) IRQ port requests MSI interrupts from the Hard IP block.
    • The sideband signal bus carries static information such as configuration information.
  • The BFM shared memory stores the descriptor tables for the DMA read and the DMA write operations.
  • A Root Complex CPU and associated PCI Express* PHY connect to the Endpoint design example, using a Root Port.

The example Endpoint design and application accomplish the following objectives:

  • Show you how to interface to the Intel L-/H-Tile Avalon-MM for PCI Express using the Avalon-MM protocol.
  • Provide a DMA channel that initiates memory read and write transactions on the PCI Express* link.

The DMA design example hierarchy consists of these components:

  • A DMA read and a DMA write module
  • An on-chip Endpoint memory (Avalon-MM slave) which uses two Avalon-MM interfaces for each engine

The RC slave module typically drives downstream transactions which target the Endpoint on‑chip buffer memory. These target memory transactions bypass the DMA engines. In addition, the RC slave module monitors performance and acknowledges incoming message TLPs.