Intel® Arria® 10 and Intel® Cyclone® 10 GX Avalon® Memory-Mapped (Avalon-MM) Interface for PCI Express* User Guide

ID 683724
Date 11/29/2023
Public
Document Table of Contents

11. Throughput Optimization

The PCI Express Base Specification defines a flow control mechanism to ensure efficient transfer of TLPs.

Each transmitter, the write requester in this case, maintains a credit limit register and a credits consumed register. The credit limit register is the sum of all credits received by the receiver, the write completer in this case. The credit limit register is initialized during the flow control initialization phase of link initialization and then updated during operation by Flow Control (FC) Update DLLPs. The credits consumed register is the sum of all credits consumed by packets transmitted. Separate credit limit and credits consumed registers exist for each of the six types of Flow Control:

  • Posted Headers
  • Posted Data
  • Non-Posted Headers
  • Non-Posted Data
  • Completion Headers
  • Completion Data

Each receiver also maintains a credit allocated counter which is initialized to the total available space in the RX buffer (for the specific Flow Control class) and then incremented as packets are pulled out of the RX buffer by the Application Layer. The value of this register is sent as the FC Update DLLP value.

Figure 54. Flow Control Update Loop

The PCIe Hard IP maintains its own flow control logic, including a credit consumed register, and ensures that no TLP is sent that would use more credits than are available for that type of TLP. If you want optimum performance and granularity, you can maintain your own credit consumed register and flow control gating logic for each credit category (Header/Data, Posted/Non-posted/Completion). This allows you to halt the transmission of TLPs for a category that is out of credits, while still allowing TLP transmission for categories that have sufficient credits.

The following steps describe the Flow Control Update loop. The corresponding numbers in the figure show the general area to which they correspond.

  1. When the Application Layer has a packet to transmit, the number of credits required is calculated. If the required credits are less than or equal to the current value of available credits (credit limit - credits consumed so far), then the packet can be transmitted immediately. However, if the credit limit minus credits consumed is less than the required credits, then the packet must be held until the credit limit is increased to a sufficient value by an FC Update DLLP. This check is performed separately for the header and data credits; a single packet consumes only a single header credit.
  2. After the packet is selected for transmission the credits consumed register is incremented by the number of credits consumed by this packet. This increment happens for both the header and data credit consumed registers.
  3. The packet is received at the other end of the link and placed in the RX buffer.
  4. At some point the packet is read out of the RX buffer by the Application Layer. After the entire packet is read out of the RX buffer, the credit allocated register can be incremented by the number of credits the packet has used. There are separate credit allocated registers for the header and data credits.
  5. The value in the credit allocated register is used to create an FC Update DLLP.
  6. After an FC Update DLLP is created, it arbitrates for access to the PCI Express link. The FC Update DLLPs are typically scheduled with a low priority; consequently, a continuous stream of Application Layer TLPs or other DLLPs (such as ACKs) can delay the FC Update DLLP for a long time. To prevent starving the attached transmitter, FC Update DLLPs are raised to a high priority under the following three circumstances:
    1. When the last sent credit allocated counter minus the amount of received data is less than MAX_PAYLOAD and the current credit allocated counter is greater than the last sent credit counter. Essentially, this means the data sink knows the data source has less than a full MAX_PAYLOAD worth of credits, and therefore is starving.
    2. When an internal timer expires from the time the last FC Update DLLP was sent, which is configured to 30 µs to meet the PCI Express Base Specification for resending FC Update DLLPs.
    3. When the credit allocated counter minus the last sent credit allocated counter is greater than or equal to 25% of the total credits available in the RX buffer, then the FC Update DLLP request is raised to high priority.

      After arbitrating, the FC Update DLLP that won the arbitration to be the next item is transmitted. In the worst case, the FC Update DLLP may need to wait for a maximum sized TLP that is currently being transmitted to complete before it can be sent.

  7. The original write requester receives the FC Update DLLP. The credit limit value is updated. If packets are stalled waiting for credits, they can now be transmitted.
Note: You must keep track of the credits consumed by the Application Layer.