R-Tile Avalon® Streaming Intel® FPGA IP for PCI Express* User Guide

ID 683501
Date 10/07/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

4.3.1.4. Avalon® Streaming TX Interface

The Application Layer transfers data to the Transaction Layer of the R-tile PCI Express IP core over the Avalon® -ST TX interface. The R-tile PCI Express IP core must assert pX_tx_st_ready_o before transmission begins.

If the R-tile PCI Express IP core is configured in Configuration Mode 0 (1x16) with a double-width configuration, there are four segments with a 256-bit data width that allows multiple TLPs per cycle. This means there are four pX_tx_stN_sop_i signals and four pX_tx_stN_eop_i signals for Configuration Mode 0 (1x16).

This interface also does not follow a fixed latency between the pX_tx_st_ready_o and pX_tx_stN_dvalid_i signals as specified by the Avalon Interface Specifications.

The R-tile PCI Express core when in Configuration Mode 0 (1x16) and in a double-width configuration provides four segments with each one having 256 bits of data (pX_tx_stN_data_i[255:0]), 128 bits of header (pX_tx_stN_hdr_i[127:0]), and 32 bits of TLP prefix (pX_tx_stN_prefix_i[31:0]). If the core is configured in Configuration Mode 0 (1x16), all four segments are used, so the data bus becomes a 1024-bit bus altogether, consisting of pX_tx_st0_data_i[255:0], pX_tx_st1_data_i[255:0], pX_tx_st2_data_i[255:0], and pX_tx_st3_data_1[255:0].

Parity generation is done via a 32:1 XOR (i.e. there is one parity bit for every 32 data, header or prefix bits).

The following guidelines must be considered by the Application logic:
  • Transmission of a TLP must be uninterrupted when pX_tx_st_ready_o is asserted. The application must not deassert pX_tx_stN_valid_i between pX_tx_stN_sop_i and pX_tx_stN_eop_i on a ready cycle unless there is backpressure from the R-tile PCIe IP core indicated by the deassertion of pX_tx_st_ready_o.
    Note: Failing to meet this guideline may cause the transmission of a TLP with an invalid LCRC.
  • For the Configuration Mode 0 (1x16) in double-width mode, the start of a TLP (pX_tx_stN_sop_i) can only happen in segment 0 (st0) or segment 2 (st2) (i.e. a given TLP cannot start on segment 1 or segment 3).
  • For the Configuration Mode 0 (1x16) in double-width mode, the header segment 2 (st2_hdr) is allowed only if segment 0 and segment 1 are also used (i.e. st0_hdr, st1_hdr and st0_data, st1_data are also used).
  • For a single TLP spanning across multiple segments, the application logic needs to send the TLP in the order of the segment index (segment st0 → st1 → st2 → st3 → st0).
  • If the TLP length of the TLP being transmitted is greater than the segment size, the segment used to assert the pX_tx_stN_eop_i signal is dictated by the TLP length.
  • If the TLP length being transmitted is less than the segment size (255 bits), the corresponding pX_tx_stN_eop_i signal needs to happen in the same segment where pX_tx_stN_sop_i is being asserted.
  • The maximum latency between the deassertion of pX_tx_st_ready_o and pX_tx_stN_valid_i is 16 coreclkout_hip cycles.
  • For Configuration Mode 0 (1x16) in single-width mode, only one segment can be used per clock cycle (i.e. st0_hdr/st0_data or st1_hdr/st1_data). In addition, If segment 1 is used, st0_data must be used by the previous TLP.
Table 56.  Avalon Streaming TX Interface Signals
Signal Name Direction Description EP/RP/BP Clock Domain
pX_tx_stN_data_i[255:0] where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input

Application Layer data for transmission. The data bus is organized in multiple 256-bit segments. In x16 mode, all four segments are used to effectively form a 1024-bit data bus. In x8 mode, two segments are used to form a 512-bit data bus. In x4 mode, each 256-bit segment is an independent data bus.

The Application Layer must provide a properly formatted TLP on the TX interface. The data is valid when the corresponding tx_stN_valid_i signal is asserted.

The mapping of message TLPs is the same as the mapping of Transaction Layer TLPs with 4-dword headers. The number of data cycles must be correct for the length and address fields in the header. Issuing a packet with an incorrect number of data cycles results in the TX interface hanging and becoming unable to accept further requests.

Note: There must be no Idle cycle between the tx_stN_sop_i and tx_stN_eop_i cycles unless there is backpressure with the deassertion of tx_st_ready_o.
EP/RP/BP coreclkout_hip
pX_tx_stN_hdr_i[127:0] where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input This is the header to be transmitted, which follows the TLP header format of the PCIe specifications. Consider the following guidelines:
  • When the R-tile Avalon® Streaming IP for PCIe is configured in EP or RP mode, it automatically calculates the Completer/Requester ID and the Application logic does not need to provide this information as part of the TLP header being transmitted. Note that this guideline does not apply when the IP is configured in TLP Bypass mode.
  • When the R-tile Avalon® Streaming IP for PCIe is configured in EP mode and SR-IOV is enabled, follow the guidelines stated in the BDF Assignments section of SR-IOV Support Implementation.

These signals are valid when the corresponding tx_stN_sop_i signal is asserted.

EP/RP/BP coreclkout_hip
pX_tx_stN_prefix_i[31:0] where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input

This is the TLP prefix to be transmitted, which follows the TLP prefix format of the PCIe specifications. PASID is supported.

These signals are valid when the corresponding tx_stN_sop_i signal is asserted.

The TLP prefix uses a Big Endian implementation (i.e. the Fmt field is in bits [31:29] and the Type field is in bits [28:24]).

If no prefix is present for a given TLP, that dword, including the Fmt field, is all zeros.

EP/RP/BP coreclkout_hip
pX_tx_stN_sop_i where

X = 0,1,2,3 (IP core number)

N = 0,2 (segment number)

Input Indicate the first cycle of a TLP when asserted in conjunction with the corresponding bit of tx_stN_valid_i. For the x16 configuration:
  • tx_st3_sop_i: When asserted, indicates the start of a TLP in tx_st3_data_i[255:0].
  • tx_st2_sop_i: When asserted, indicates the start of a TLP in tx_st2_data_i[255:0].
  • tx_st1_sop_i: When asserted, indicates the start of a TLP in tx_st1_data_i[255:0].
  • tx_st0_sop_i: When asserted, indicates the start of a TLP in tx_st0_data_i[255:0].

These signals are asserted for one clock cycle per each TLP. They also qualify the corresponding tx_stN_hdr_i and tx_stN_tlp_prfx_i signals.

Note: pX_tx_stN_sop_i pulses can only be sent on segments 0 or 2 (st0 or st2).
EP/RP/BP coreclkout_hip
pX_tx_stN_eop_i where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input Indicate the last cycle of a TLP when asserted in conjunction with the corresponding bit of tx_stN_valid_i. For the x16 configuration:
  • tx_st3_eop_i: When asserted, indicates the end of a TLP in tx_st3_data_i[255:0].
  • tx_st2_eop_i: When asserted, indicates the end of a TLP in tx_st2_data_i[255:0].
  • tx_st1_eop_i: When asserted, indicates the end of a TLP in tx_st1_data_i[255:0].
  • tx_st0_eop_i: When asserted, indicates the end of a TLP in tx_st0_data_i[255:0].

These signals are asserted for one clock cycle per each TLP.

EP/RP/BP coreclkout_hip
pX_tx_stN_dvalid_i where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input

Qualify the data of the corresponding segment of tx_stN_data_i into the IP core on ready cycles.

To facilitate timing closure, Intel recommends that you register both the tx_st_ready_o and tx_stN_dvalid_i signals.

EP/RP/BP coreclkout_hip
pX_tx_stN_hvalid_i where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input

Qualify the header of the corresponding segment of tx_stN_data_i into the IP core on ready cycles.

To facilitate timing closure, Intel recommends that you register both the tx_st_ready_o and tx_stN_hvalid_i signals.

EP/RP/BP coreclkout_hip
pX_tx_stN_pvalid_i where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input

Qualify the prefix of the corresponding segment of tx_stN_data_i into the IP core on ready cycles.

To facilitate timing closure, Intel recommends that you register both the tx_st_ready_o and tx_stN_pvalid_i signals.

EP/RP/BP coreclkout_hip
pX_tx_stN_data_par_i[Z:0] where

X = 0,1,2,3 (IP core number) and Z varies based on the core.

N = 0,1,2,3 (segment number)

Input

Parity for tx_stN_data_i. Bit [0] corresponds to tx_stN_data_i[31:0], bit [1] corresponds to tx_stN_data_i[63:32], and so on.

By default, the PCIe Hard IP generates the parity for the TX data.

EP/RP/BP coreclkout_hip
pX_tx_stN_hdr_par_i[3:0] where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input

Parity for tx_stN_hdr_i.

By default, the PCIe Hard IP generates the parity for the TX header.

EP/RP/BP coreclkout_hip
pX_tx_stN_prefix_par_i where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input

Parity for tx_stN_tlp_prfx_i.

By default, the PCIe Hard IP generates the parity for the TX TLP prefix.

EP/RP/BP coreclkout_hip
pX_tx_st_ready_o where

X = 0,1,2,3 (IP core number)

Output

Indicates that the PCIe Hard IP is ready to accept data. The readyLatency maximum is 16 cycles.

If tx_st_ready_o is asserted by the Transaction Layer in the PCIe Hard IP on cycle <n>, then <n> + readyLatency is a ready cycle, during which the Application may assert tx_stN_valid_i and transfer data.

If tx_st_ready_o is deasserted by the Transaction Layer on cycle <n>, then the Application must deassert tx_stN_valid_i within the readyLatency number of cycles after cycle <n>.

tx_st_ready_o can be deasserted in the following conditions:
  • The LTSSM is not ready.
  • A Retry is in progress.
  • The R-tile Avalon-ST IP is busy sending internally generated TLPs.
  • The internal R-tile TX FIFO is full.
EP/RP/BP coreclkout_hip

As an example, Avalon® Streaming TX Interface Timings below shows the behavior of the Avalon Streaming TX interface in a back-to-back TLPs scenario with data spanning across multiple segments. The following text describes the waveforms per clock cycle:

  1. Clock cycle 1: The R-tile Intel FPGA IP for PCI Express asserts p0_tx_st_ready_o signal, indicating the Hard IP is ready to accept TLPs from the Application logic.
  2. Clock cycle 2:
    1. The start of the first TLP (T0) is in segment 0, indicated by the assertion of p0_tx_st0_sop_i.
    2. The signal p0_tx_st0_hvalid_i is asserted to validate the header of this first TLP (T0H0) in the p0_tx_st0_hdr_i bus.
    3. The signal p0_tx_st0_dvalid_i is asserted to validate the data of this first TLP (T0D0) in the p0_tx_st0_data_i bus.
    4. The signal p0_tx_st1_dvalid_i is asserted to validate the next portion of the data of this first TLP (T0D1) in the p0_tx_st1_data_i bus.
    5. The signal p0_tx_st2_dvalid_i is asserted to validate the next portion of the data of this first TLP (T0D2) in the p0_tx_st2_data_i bus.
    6. The signal p0_tx_st3_dvalid_i is asserted to validate the final portion of the data of this first TLP (T0D3) in the p0_tx_st3_data_i bus.
    7. The end of this first TLP (T0) is in segment 3, denoted by the assertion of p0_tx_st3_eop_i.
  3. Clock cycle 3:
    1. The next TLP (T1), arrives in segment 0, as denoted by p0_tx_st0_sop_i staying high.
    2. The signal p0_tx_st0_hvalid_i is asserted to validate the header of this TLP (T1H0) in the p0_tx_st0_hdr_i bus.
    3. The signal p0_tx_st0_dvalid_i is asserted to validate the data of this TLP (T1D0) in the p0_tx_st0_data_i bus.
    4. The signal p0_tx_st1_dvalid_i is asserted to validate the next portion of the data of this TLP (T1D1) in the p0_tx_st1_data_i bus.
    5. The signal p0_tx_st2_dvalid_i is asserted to validate the next portion of the data of this TLP (T1D2) in the p0_tx_st2_data_i bus.
    6. The signal p0_tx_st3_dvalid_i is asserted to validate the final portion of the data of this TLP (T1D2) in the p0_tx_st3_data_i bus.
    7. The end of this TLP (T1) is in segment 3, denoted by p0_tx_st3_eop_i staying high.
Figure 30.  Avalon® Streaming TX Interface Timings
Note: For Configuration Mode 0 (1x16), the start of a TLP (pX_tx_stN_sop_i) can only happen on segment 0 (st0) or segment 2 (st2).