AN 456: PCI Express High Performance Reference Design

ID 683541
Date 12/12/2018

1.3. Reference Design Functional Description

The reference design consists of the following components:
  • An application layer that consists of the chaining DMA example generated by the IP core
  • The IP core variation
  • A software application and Windows driver configured specifically for this reference design
    Figure 5. Reference Design Components

The chaining DMA example consists of two DMA modules in the application logic and an internal Endpoint memory. The design supports simultaneous DMA read and DMA write transactions. The DMA write module transfers data from Endpoint memory to the Root Complex system memory across the PCIe link. The DMA read module implements read transfers data from the Root Complex system memory across the PCIe link to Endpoint memory.

The reference design is included the FPGA and relies on no other hardware interface except the PCIe link. A chaining DMA provides higher performance than a simple DMA for non-contiguous memory transfers between the system and Endpoint memory. For a simple DMA, the software application programs the DMA registers for every transfer. The chaining DMA uses descriptor tables for each memory page. These descriptor tables contain the following information:

  • Transfer length
  • Source and destination addresses for the transfer
  • Control information that sets the handshaking behavior between the software application and the DMA module

Each descriptor consists of four dwords. The descriptors are stored in a contiguous memory page.

Based on the attributes set in the Parameter Editor, the software application creates the necessary descriptor tables in the system memory. The software application also creates a descriptor header table. This table specifies the total number of descriptors and the address of the first descriptor table. At the beginning of the transfer, the software application programs the DMA registers with the descriptor header table. The DMA module continuously collects these descriptor tables for each DMA read and write and performs the transfers specified.

The DMA module also includes a performance counter. The counter starts when the software writes a descriptor header table to the DMA registers. It continues counting until the last data has been transferred by the DMA module. After the transfer is complete, the software application uses the counter value to compute the throughput for the transfer and reports it. The counter value includes latency for the initial descriptor read. Consequently, the throughput reported by the software application is less than the actual throughput.