P-Tile Avalon® Streaming Intel® FPGA IP for PCI Express* User Guide

ID 683059
Date 6/26/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

7.1.2. Debugging Data Transfer and Performance Issues

There are many possible reasons causing the PCIe link to stop transmitting data. The PCI Express base specification defines three types of errors, outlined in the table below:

Table 115.  Error Types Defined by the PCI Express Base Specification
Type Responsible Agent Description
Correctable Hardware While correctable errors may affect system performance, data integrity is maintained.
Uncorrectable, non-fatal Device software Uncorrectable, non-fatal errors are defined as errors in which data is lost, but system integrity is maintained. For example, the fabric may lose a particular TLP, but it still works without problems.
Uncorrectable, fatal System software Errors generated by a loss of data and system failure are considered uncorrectable and fatal. Software must determine how to handle such errors: whether to reset the link or implement other means to minimize the problem.
Table 116.  Correctable Error Status Register (AER)
Observation Issue Resolution
Receiver error bit set

Physical layer error which may be due to a PCS error when a lane is in L0, or a Control symbol being received in the wrong lane, or signal Integrity issues where the link may transition from L0 to the Recovery state.

Use the configuration output interface, or the Hard IP reconfiguration interface and the flow chart in Link Training Debugging Flow to obtain more information about the error.
Bad DLLP bit set

Data link layer error which may occur when a CRC verification fails.

Use the configuration output interface or the Hard IP reconfiguration interface to obtain more information about the error.
Bad TLP bit set Data link layer error which may occur when an LCRC verification fails or when a sequence number error occurs. Use the configuration output interface or the Hard IP reconfiguration interface to obtain more information about the error.
Replay_num_rollover bit set Data link layer error which may be due to TLPs sent without success (no ACK) four times in a row. Use the configuration output interface or the Hard IP reconfiguration interface to obtain more information about the error.
replay timer timeout status bit set

Data link layer error which may occur when no ACK or NAK was received within the timeout period for the TLPs transmitted.

Use the configuration output interface or the Hard IP reconfiguration interface to obtain more information about the error.
Advisory non-fatal Transaction layer error which may be due to higher priority uncorrectable error detected.  
Corrected internal error bits set Transaction layer error which may be due to an ECC error in the internal Hard IP RAM. Use the error interface, configuration output interface, or the Hard IP reconfiguration interface and DBI registers to obtain more information about the error.
Table 117.  Uncorrectable Error Status Register (AER)
Observation Issue Resolution
Data link protocol error Data link layer error which may be due to transmitter receiving an ACK/NAK whose Seq ID does not correspond to an unacknowledged TLP or ACK sequence number. Use the configuration output interface, Hard IP reconfiguration interface to obtain more information about the error.
Surprise down error Data link layer error which may be due to link_up_o getting deasserted during L0, indicating the physical layer link is going down unexpectedly. Use the error interface, configuration output interface, Hard IP reconfiguration interface and DBI registers to obtain more information about the error.
Flow control protocol error

Transaction layer error which can be due to the receiver reporting more than the allowed credit limit.

This error occurs when a component does not receive updated flow control credits with the 200 μs limit.

Use the TX/RX flow control interface, configuration output interface, Hard IP reconfiguration interface to obtain more information about the error.
Poisoned TLP received Transaction layer error which can be due to a received TLP with the EP bit set. Use the error interface, configuration output interface, configuration intercept interface, Hard IP reconfiguration interface to obtain more information on the error and determine the appropriate action.
Completion timeout Transaction layer error which can be due to a completion not received within the required amount of time after a non-posted request was sent. Use the error interface, completion timeout interface, configuration output interface, Hard IP reconfiguration interface to obtain more information on the error.
Completer abort Transaction layer error which can be due to a completer being unable to fulfill a request due to a problem with the requester or a failure of the completer. Use the configuration output interface, error interface, Hard IP reconfiguration interface to obtain more information on the error.
Unexpected completion

Transaction layer error which can be due to a requester receiving a completion that doesn’t match any request awaiting a completion.

The TLP is deleted by the Hard IP and not presented to the Application Layer.

Use the configuration output interface, error interface, Hard IP reconfiguration interface to obtain more information on the error.
Receiver overflow

Transaction layer error which can be due to a receiver receiving more TLPs than the available receive buffer space.

The TLP is deleted by the Hard IP and not presented to the Application Layer.

Use the TX/RX flow control interface, error interface, configuration output interface, Hard IP reconfiguration interface to obtain more information on the error.
Malformed TLP

Transaction layer error which can be due to errors in the received TLP header.

The TLP is deleted by the Hard IP and not presented to the Application Layer.

Use the error interface, configuration output interface, Hard IP reconfiguration interface to obtain more information on the error.
ECRC error

Transaction layer error which can be due to an ECRC check failure at the receiver despite the fact that the TLP is not malformed and the LCRC check is valid.

The Hard IP block handles this TLP automatically. If the TLP is a non-posted request, the Hard IP block generates a completion with a completer abort status. The TLP is deleted by the Hard IP and not presented to the Application Layer.

Use the configuration output interface, Hard IP reconfiguration interface to obtain more information on the error.
Unsupported request

Transaction layer error which can be due to the completer being unable to fulfill the request.

The TLP is deleted in the Hard IP block and not presented to the Application Layer. If the TLP is a non-posted request, the Hard IP block generates a completion with Unsupported Request status.

Use the configuration output interface, error interface, Hard IP reconfiguration interface to obtain more information on the error.
ACS violation Transaction layer error which can be due to access control error in the received posted or non-posted request. Use the configuration output interface, error interface, Hard IP reconfiguration interface to obtain more information on the error.
Uncorrectable internal error Transaction layer error which can be due to an internal error that cannot be corrected by the hardware. Use the error interface, configuration output interface, Hard IP reconfiguration interface and DBI registers to obtain more information on the error.
Atomic egress blocked Use the error interface, configuration output interface, Hard IP reconfiguration interface to obtain more information on the error.
TLP prefix blocked EP or RP only Use the error interface, configuration output interface, Hard IP reconfiguration interface to obtain more information on the error.
Poisoned TLP egress blocked EP or RP only Use the error interface, configuration output interface, configuration intercept interface, Hard IP reconfiguration interface to obtain more information on the error.

Use the debug tools mentioned in the next two sections for debugging link training issues observed on the PCI Express link when using the P-Tile Avalon® -ST IP for PCI Express.