R-tile Avalon® Streaming Intel® FPGA IP for PCI Express* User Guide

ID 683501
Date 12/13/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

4.4.1.4. Avalon® Streaming TX Interface

The Application Layer transfers data to the Transaction Layer of the PCI Express IP core over the Avalon® -ST TX interface. The Transaction Layer must assert tx_st_ready_o before transmission begins. Transmission of a packet must be uninterrupted when tx_st_ready_o is asserted.

There are four segments with a 256-bit data width. The interface supports multiple TLPs per cycle.

This interface supports one tx_st_sop_i signal and one tx_st_eop_i signal per cycle for each segment when the R-Tile IP is operating in a x16 configuration. This means there are four tx_st_sop_i signals and four tx_st_eop_i signals for the x16 IP core. This interface also does not follow a fixed latency between the tx_st_ready_o and tx_st_valid_i signals as specified by the Avalon Interface Specifications. Data can be received any time within the maximum latency between the deassertion of tx_st_ready_o and tx_st_valid_i, which is 16 coreclkout_hip cycles.

Note: tx_stN_sop_i pulses can only be sent on segments 0 and/or 2 (st0 and/or st2).

The x16 core provides four segments with each one having 256 bits of data (tx_st_data_i[255:0]), 128 bits of header (tx_st_hdr_i[127:0]), and 32 bits of TLP prefix (tx_st_tlp_prfx_i[31:0]). If this core is configured in the 1x16 mode, all four segments are used, so the data bus becomes a 1024-bit bus altogether, consisting of tx_st0_data_i[255:0], tx_st1_data_i[255:0], tx_st2_data_i[255:0], and tx_st3_data_1[255:0]. The start of packet can appear in any of the segments, as indicated by the tx_stN_sop_i signals.

Parity generation is done via a 32:1 XOR (i.e. there is one parity bit for every 32 data, header or prefix bits).

Table 55.  Avalon Streaming TX Interface Signals
Signal Name Direction Description EP/RP/BP Clock Domain
pX_tx_stN_data_i[255:0] where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input

Application Layer data for transmission. The data bus is organized in multiple 256-bit segments. In x16 mode, all four segments are used to effectively form a 1024-bit data bus. In x8 mode, two segments are used to form a 512-bit data bus. In x4 mode, each 256-bit segment is an independent data bus.

The Application Layer must provide a properly formatted TLP on the TX interface. The data is valid when the corresponding tx_stN_valid_i signal is asserted.

The mapping of message TLPs is the same as the mapping of Transaction Layer TLPs with 4-dword headers. The number of data cycles must be correct for the length and address fields in the header. Issuing a packet with an incorrect number of data cycles results in the TX interface hanging and becoming unable to accept further requests.

Note: There must be no Idle cycle between the tx_stN_sop_i and tx_stN_eop_i cycles unless there is backpressure with the deassertion of tx_st_ready_o.
EP/RP/BP coreclkout_hip
pX_tx_stN_hdr_i[127:0] where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input This is the header to be transmitted, which follows the TLP header format of the PCIe specifications except for the requester ID/completer ID fields (tx_stN_hdr_i[95:80]):
  • tx_stN_hdr_i[95:84]: tx_st_vf_num[11:0]
  • tx_stN_hdr_i[83]: tx_st_vf_active
  • tx_stN_hdr_i[82:80]: tx_st_func_num[2:0]

These signals are valid when the corresponding tx_stN_sop_i signal is asserted.

EP/RP/BP coreclkout_hip
pX_tx_stN_prefix_i[31:0] where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input

This is the TLP prefix to be transmitted, which follows the TLP prefix format of the PCIe specifications. PASID is supported.

These signals are valid when the corresponding tx_stN_sop_i signal is asserted.

The TLP prefix uses a Big Endian implementation (i.e. the Fmt field is in bits [31:29] and the Type field is in bits [28:24]).

If no prefix is present for a given TLP, that dword, including the Fmt field, is all zeros.

EP/RP/BP coreclkout_hip
pX_tx_stN_sop_i where

X = 0,1,2,3 (IP core number)

N = 0,2 (segment number)

Input Indicate the first cycle of a TLP when asserted in conjunction with the corresponding bit of tx_stN_valid_i. For the x16 configuration:
  • tx_st3_sop_i: When asserted, indicates the start of a TLP in tx_st3_data_i[255:0].
  • tx_st2_sop_i: When asserted, indicates the start of a TLP in tx_st2_data_i[255:0].
  • tx_st1_sop_i: When asserted, indicates the start of a TLP in tx_st1_data_i[255:0].
  • tx_st0_sop_i: When asserted, indicates the start of a TLP in tx_st0_data_i[255:0].

These signals are asserted for one clock cycle per each TLP. They also qualify the corresponding tx_stN_hdr_i and tx_stN_tlp_prfx_i signals.

Note: pX_tx_stN_sop_i pulses can only be sent on segments 0 and/or 2 (st0 and/or st2).
EP/RP/BP coreclkout_hip
pX_tx_stN_eop_i where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input Indicate the last cycle of a TLP when asserted in conjunction with the corresponding bit of tx_stN_valid_i. For the x16 configuration:
  • tx_st3_eop_i: When asserted, indicates the end of a TLP in tx_st3_data_i[255:0].
  • tx_st2_eop_i: When asserted, indicates the end of a TLP in tx_st2_data_i[255:0].
  • tx_st1_eop_i: When asserted, indicates the end of a TLP in tx_st1_data_i[255:0].
  • tx_st0_eop_i: When asserted, indicates the end of a TLP in tx_st0_data_i[255:0].

These signals are asserted for one clock cycle per each TLP.

EP/RP/BP coreclkout_hip
pX_tx_stN_dvalid_i where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input

Qualify the data of the corresponding segment of tx_stN_data_i into the IP core on ready cycles.

To facilitate timing closure, Intel recommends that you register both the tx_st_ready_o and tx_stN_dvalid_i signals.

EP/RP/BP coreclkout_hip
pX_tx_stN_hvalid_i where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input

Qualify the header of the corresponding segment of tx_stN_data_i into the IP core on ready cycles.

To facilitate timing closure, Intel recommends that you register both the tx_st_ready_o and tx_stN_hvalid_i signals.

EP/RP/BP coreclkout_hip
pX_tx_stN_pvalid_i where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input

Qualify the prefix of the corresponding segment of tx_stN_data_i into the IP core on ready cycles.

To facilitate timing closure, Intel recommends that you register both the tx_st_ready_o and tx_stN_pvalid_i signals.

EP/RP/BP coreclkout_hip
pX_tx_stN_data_par_i[Z:0] where

X = 0,1,2,3 (IP core number) and Z varies based on the core.

N = 0,1,2,3 (segment number)

Input

Parity for tx_stN_data_i. Bit [0] corresponds to tx_stN_data_i[31:0], bit [1] corresponds to tx_stN_data_i[63:32], and so on.

By default, the PCIe Hard IP generates the parity for the TX data.

EP/RP/BP coreclkout_hip
pX_tx_stN_hdr_par_i[3:0] where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input

Parity for tx_stN_hdr_i.

By default, the PCIe Hard IP generates the parity for the TX header.

EP/RP/BP coreclkout_hip
pX_tx_stN_prefix_par_i where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input

Parity for tx_stN_tlp_prfx_i.

By default, the PCIe Hard IP generates the parity for the TX TLP prefix.

EP/RP/BP coreclkout_hip
pX_tx_st_ready_o where

X = 0,1,2,3 (IP core number)

Output

Indicates that the PCIe Hard IP is ready to accept data. The readyLatency maximum is 16 cycles.

If tx_st_ready_o is asserted by the Transaction Layer in the PCIe Hard IP on cycle <n>, then <n> + readyLatency is a ready cycle, during which the Application may assert tx_stN_valid_i and transfer data.

If tx_st_ready_o is deasserted by the Transaction Layer on cycle <n>, then the Application must deassert tx_stN_valid_i within the readyLatency number of cycles after cycle <n>.

tx_st_ready_o can be deasserted in the following conditions:
  • The LTSSM is not ready.
  • A Retry is in progress.
  • There are not enough credits available to send the request.
  • The R-tile Avalon-ST IP is busy sending internally generated TLPs.
  • The internal R-tile TX FIFO is full.
EP/RP/BP coreclkout_hip

As an example, Figure 26 below shows the behavior of the Avalon Streaming TX interface in a back-to-back TLPs scenario with data spanning across multiple segments. The following text describes the waveforms per clock cycle:

  1. Clock cycle 1: The R-tile Intel FPGA IP for PCI Express asserts p0_tx_st_ready_o signal, indicating the Hard IP is ready to accept TLPs from the Application logic.
  2. Clock cycle 2:
    1. The start of the first TLP (T0) is in segment 0, indicated by the assertion of p0_tx_st0_sop_i.
    2. The signal p0_tx_st0_hvalid_i is asserted to validate the header of this first TLP (T0H0) in the p0_tx_st0_hdr_i bus.
    3. The signal p0_tx_st0_dvalid_i is asserted to validate the data of this first TLP (T0D0) in the p0_tx_st0_data_i bus.
    4. The signal p0_tx_st1_dvalid_i is asserted to validate the next portion of the data of this first TLP (T0D1) in the p0_tx_st1_data_i bus.
    5. The signal p0_tx_st2_dvalid_i is asserted to validate the next portion of the data of this first TLP (T0D2) in the p0_tx_st2_data_i bus.
    6. The signal p0_tx_st3_dvalid_i is asserted to validate the final portion of the data of this first TLP (T0D3) in the p0_tx_st3_data_i bus.
    7. The end of this first TLP (T0) is in segment 3, denoted by the assertion of p0_tx_st3_eop_i.
  3. Clock cycle 3:
    1. The next TLP (T1), arrives in segment 0, as denoted by p0_tx_st0_sop_i staying high.
    2. The signal p0_tx_st0_hvalid_i is asserted to validate the header of this TLP (T1H0) in the p0_tx_st0_hdr_i bus.
    3. The signal p0_tx_st0_dvalid_i is asserted to validate the data of this TLP (T1D0) in the p0_tx_st0_data_i bus.
    4. The signal p0_tx_st1_dvalid_i is asserted to validate the next portion of the data of this TLP (T1D1) in the p0_tx_st1_data_i bus.
    5. The signal p0_tx_st2_dvalid_i is asserted to validate the next portion of the data of this TLP (T1D2) in the p0_tx_st2_data_i bus.
    6. The signal p0_tx_st3_dvalid_i is asserted to validate the final portion of the data of this TLP (T1D2) in the p0_tx_st3_data_i bus.
    7. The end of this TLP (T1) is in segment 3, denoted by p0_tx_st3_eop_i staying high.
Figure 26.  Avalon® Streaming TX Interface Timings
Note: tx_stN_sop_i pulses can only be sent on segments 0 and/or 2 (st0 and/or st2).