Intel Arria 10 and Intel Cyclone 10 GX Avalon -ST Interface for PCI Express User Guide
Version Information
Updated for: |
---|
Intel® Quartus® Prime Design Suite 18.0 |
1. Datasheet
1.1. Intel Arria 10 or Intel Cyclone 10 GX Avalon-ST Interface for PCIe Datasheet
Intel® Arria® 10 and Intel® Cyclone® 10 GX FPGAs include a configurable, hardened protocol stack for PCI Express® that is compliant with the PCI Express Base Specification 3.0 and PCI Express Base Specification 2.0 respectively. The Hard IP for PCI Express using the Avalon® Streaming (Avalon-ST) interface is the most flexible variant. However, this variant requires a thorough understanding of the PCIe® Protocol.
The following table shows the aggregate bandwidth of a PCI Express link for Gen1, Gen2, and Gen3 for 1, 2, 4, and 8 lanes. This table provides bandwidths for a single transmit (TX) or receive (RX) channel. The numbers double for duplex operation. The protocol specifies 2.5 giga-transfers per second for Gen1, 5.0 giga-transfers per second for Gen2, and 8.0 giga‑transfers per second for Gen3. Gen1 and Gen2 use 8B/10B encoding which introduces a 20% overhead. In contrast, Gen3 uses 128b/130b encoding which reduces the data throughput lost to encoding to about 1.5%.
Link Width | ||||
---|---|---|---|---|
×1 | ×2 | ×4 | ×8 | |
PCI Express Gen1 (2.5 Gbps) |
2 |
4 |
8 |
16 |
PCI Express Gen2 (5.0 Gbps) |
4 |
8 |
16 |
32 |
PCI Express Gen3 (8.0 Gbps) |
7.87 |
15.75 |
31.51 |
63 |
The following table shows the aggregate bandwidth of a PCI Express link for Gen1 and Gen2 for 1, 2, and 4 lanes. This table provides bandwidths for a single transmit (TX) or receive (RX) channel. The numbers double for duplex operation. The protocol specifies 2.5 giga-transfers per second for Gen1 and 5.0 giga-transfers per second for Gen2. Gen1 and Gen2 use 8B/10B encoding which introduces a 20% overhead.
Link Width | |||
---|---|---|---|
×1 | ×2 | ×4 | |
PCI Express Gen1 (2.5 Gbps) |
2 |
4 |
8 |
PCI Express Gen2 (5.0 Gbps) |
4 |
8 |
16 |
Refer to the AN 456: PCI Express High Performance Reference Design for more information about calculating bandwidth for the hard IP implementation of PCI Express in many Intel FPGAs, including the Intel® Arria® 10 Hard IP for PCI Express IP core.
Devices
1.1.1. Intel Arria 10 or Intel Cyclone 10 GX Features
New features in the Quartus® Prime 17.1 software release:
- Added Intel® Cyclone® 10 GX support for up to Gen2 x4 configurations.
- Added parameter to invert the RX polarity.
The Intel® Arria® 10 or Intel® Cyclone® 10 GX Hard IP for PCI Express supports the following features:
- Complete protocol stack including the Transaction, Data Link, and Physical Layers implemented as hard IP.
- Support for ×1, ×2, ×4, and ×8 configurations with Gen1, Gen2, or Gen3 lane rates for Native Endpoints in Intel® Arria® 10 devices.
- Support for ×1, ×2, and ×4 configurations with Gen1 or Gen2 lane rates for Native Endpoints in Intel® Cyclone® 10 GX devices.
- Dedicated 16 KB receive buffer.
- Optional support for Configuration via Protocol (CvP) using the PCIe link allowing the I/O and core bitstreams to be stored separately.
- Example designs demonstrating parameterization, design modules, and connectivity.
- Extended credit allocation settings to better optimize the RX buffer space based on application type.
- Support for multiple packets per cycle with the 256‑bit Avalon‑ST interface.
- Optional end-to-end cyclic redundancy code (ECRC) generation and checking and advanced error reporting (AER) for high reliability applications.
- Easy to use:
- Flexible configuration.
- Substantial on-chip resource savings and guaranteed timing closure.
- No license requirement.
- Example designs to get started.
Feature |
Avalon-ST Interface |
Avalon-MM Interface |
Avalon-MM DMA |
---|---|---|---|
IP Core License |
Free |
Free |
Free |
Native Endpoint |
Supported |
Supported |
Supported |
Root port |
Supported |
Supported |
Not Supported |
Gen1 |
×1, ×2, ×4, ×8 |
×1, ×2, ×4, ×8 |
Not Supported |
Gen2 |
×1, ×2, ×4, ×8 |
×1, ×2, ×4, ×8 |
×4, ×8 |
Gen3 |
×1, ×2, ×4, ×8 |
×1, ×2, ×4 |
×2, ×4, ×8 |
64-bit Application Layer interface |
Supported |
Supported |
Not supported |
128-bit Application Layer interface |
Supported |
Supported |
Supported |
256‑bit Application Layer interface |
Supported |
Not Supported |
Supported |
Maximum payload size |
128, 256, 512, 1024, 2048 bytes |
128, 256 bytes |
128, 256 bytes |
Number of tags supported for non-posted requests |
32, 64, 128, 256 1 |
8 for 64-bit interface 16 for 128-bit interface |
16 or 256 |
Automatically handle out-of-order completions (transparent to the Application Layer) |
Not supported |
Supported |
Not Supported |
Automatically handle requests that cross 4 KB address boundary (transparent to the Application Layer) |
Not supported |
Supported |
Supported |
Polarity Inversion of PIPE interface signals |
Supported |
Supported |
Supported |
Number of MSI requests |
1, 2, 4, 8, 16, or 32 |
1, 2, 4, 8, 16, or 32 |
1, 2, 4, 8, 16, or 32 |
MSI-X |
Supported |
Supported |
Supported |
Legacy interrupts |
Supported |
Supported |
Supported |
Expansion ROM |
Supported |
Not supported |
Not supported |
PCIe bifurcation | Not supported | Not supported | Not supported |
Transaction Layer Packet type (TLP) (transmit support) |
Avalon-ST Interface |
Avalon-MM Interface |
Avalon-MM DMA |
---|---|---|---|
Memory Read Request (Mrd) | EP/RP | EP/RP | EP |
Memory Read Lock Request (MRdLk) | EP/RP | EP | |
Memory Write Request (MWr) | EP/RP | EP/RP | EP |
I/O Read Request (IORd) | EP/RP | EP/RP | |
I/O Write Request (IOWr) | EP/RP | EP/RP | |
Config Type 0 Read Request (CfgRd0) | RP | RP | |
Config Type 0 Write Request (CfgWr0) | RP | RP | |
Config Type 1 Read Request (CfgRd1) | RP | RP | |
Config Type 1 Write Request (CfgWr1) | RP | RP | |
Message Request (Msg) | EP/RP | EP/RP | |
Message Request with Data (MsgD) | EP/RP | EP/RP | |
Completion (Cpl) | EP/RP | EP/RP | EP |
Completion with Data (CplD) | EP/RP | EP | |
Completion-Locked (CplLk) | EP/RP | ||
Completion Lock with Data (CplDLk) | EP/RP | ||
Fetch and Add AtomicOp Request (FetchAdd) | EP |
The Intel® Arria® 10 or Intel® Cyclone® 10 GX Avalon-ST Interface for PCIe Solutions User Guide explains how to use this IP core and not the PCI Express protocol. Although there is inevitable overlap between these two purposes, use this document only in conjunction with an understanding of the PCI Express Base Specification.
1.2. Release Information
Item |
Description |
---|---|
Version |
18.0 |
Release Date | May 2018 |
Ordering Codes |
No ordering code is required |
Product IDs |
There are no encrypted files for the Intel® Arria® 10 or Intel® Cyclone® 10 GX Hard IP for PCI Express. The Product ID and Vendor ID are not required because this IP core does not require a license. |
Vendor ID |
Intel verifies that the current version of the Intel® Quartus® Prime software compiles the previous version of each IP core, if this IP core was included in the previous release. Intel reports any exceptions to this verification in the Intel IP Release Notes or clarifies them in the Intel® Quartus® Prime IP Update tool. Intel does not verify compilation with IP core versions older than the previous release.
1.3. Device Family Support
The following terms define device support levels for Intel® FPGA IP cores:
- Advance support—the IP core is available for simulation and compilation for this device family. Timing models include initial engineering estimates of delays based on early post-layout information. The timing models are subject to change as silicon testing improves the correlation between the actual silicon and the timing models. You can use this IP core for system architecture and resource utilization studies, simulation, pinout, system latency assessments, basic timing assessments (pipeline budgeting), and I/O transfer strategy (data-path width, burst depth, I/O standards tradeoffs).
- Preliminary support—the IP core is verified with preliminary timing models for this device family. The IP core meets all functional requirements, but might still be undergoing timing analysis for the device family. It can be used in production designs with caution.
- Final support—the IP core is verified with final timing models for this device family. The IP core meets all functional and timing requirements for the device family and can be used in production designs.
Device Family |
Support Level |
---|---|
Intel® Arria® 10 or Intel® Cyclone® 10 GX |
Final. |
Other device families |
Refer to the Intel's PCI Express IP Solutions web page for support information on other device families. |
1.4. Configurations
The Intel® Arria® 10 or Intel® Cyclone® 10 GX Hard IP for PCI Express includes a full hard IP implementation of the PCI Express stack including the following layers:
- Physical (PHY), including:
- Physical Media Attachment (PMA)
- Physical Coding Sublayer (PCS)
- Media Access Control (MAC)
- Data Link Layer (DL)
- Transaction Layer (TL)
The Hard IP supports all memory, I/O, configuration, and message transactions. It is optimized for Intel devices. The Application Layer interface is also optimized to achieve maximum effective throughput. You can customize the Hard IP to meet your design requirements.
- Two Endpoints that connect to a PCIe switch.
- A host CPU that implements CvP using the PCI Express link connects through the switch.
1.5. Debug Features
Debug features allow observation and control of the Hard IP for faster debugging of system-level problems.
1.6. IP Core Verification
To ensure compliance with the PCI Express specification, Intel performs extensive verification. The simulation environment uses multiple testbenches that consist of industry‑standard bus functional models (BFMs) driving the PCI Express link interface. Intel performs the following tests in the simulation environment:
- Directed and pseudorandom stimuli test the Application Layer interface, Configuration Space, and all types and sizes of TLPs
- Error injection tests inject errors in the link, TLPs, and Data Link Layer Packets (DLLPs), and check for the proper responses
- PCI-SIG® Compliance Checklist tests that specifically test the items in the checklist
- Random tests that test a wide range of traffic patterns
Intel provides example designs that you can leverage to test your PCBs and complete compliance base board testing (CBB testing) at PCI-SIG, upon request.
1.6.1. Compatibility Testing Environment
Intel has performed significant hardware testing to ensure a reliable solution. In addition, Intel internally tests every release with motherboards and PCI Express switches from a variety of manufacturers. All PCI-SIG compliance tests are run with each IP core release.
1.7. Resource Utilization
Because the PCIe protocol stack is implemented in hardened logic, it uses no core device resources (no ALMs and no embedded memory).
1.8. Recommended Speed Grades
Link Rate |
Link Width |
Interface Width |
Application Clock Frequency (MHz) |
Recommended Speed Grades |
---|---|---|---|---|
Gen1 |
x1 |
64 bits |
62.5 2,125 |
–1, –2 , –3 |
x2 |
64 bits |
125 |
–1, –2, –3 |
|
x4 |
64 bits |
125 |
–1, –2, –3 |
|
x8 |
64 bits |
250 |
–1, –2 |
|
x8 |
128 Bits |
125 |
–1, –2, –3 |
|
Gen2 |
x1 |
64 bits |
125 |
–1, –2, –3 |
x2 |
64 bits |
125 |
–1, –2, –3 |
|
x4 |
64 bits |
250 |
–1, –2 |
|
x4 |
128 bits |
125 |
–1, –2, –3 |
|
x8 |
128 bits |
250 |
–1, –2 |
|
x8 |
256 bits |
125 |
–1, –2, –3 |
|
Gen3 |
x1 |
64 bits |
125 |
–1, –2, –3 |
x2 |
64 bits |
250 |
–1, –2 |
|
x2 |
128 bits |
125 |
–1, –2, –3 |
|
x4 |
128 bits |
250 |
–1, –2 |
|
x4 |
256 bits |
125 |
–1, –2, –3 |
|
x8 |
256 bits |
250 |
–1, –2 |
Lane Rate |
Link Width |
Interface Width |
Application Clock Frequency (MHz) |
Recommended Speed Grades |
---|---|---|---|---|
Gen1 |
×1 |
64 bits |
62.5 3, 125 |
–5, –6 |
×2 |
64 bits |
125 |
–5, –6 |
|
×4 |
64 bits |
125 |
–5, –6 |
|
Gen2 |
×1 |
64 bits |
125 |
–5, –6 |
×2 |
64 bits |
125 |
–5, –6 |
|
×4 |
64 bits |
250 |
–5 |
|
×4 |
128 bits |
125 |
–5, –6 |
1.9. Creating a Design for PCI Express
Select the PCIe variant that best meets your design requirements.
- Is your design an Endpoint or Root Port?
- What Generation do you intend to implement?
- What link width do you intend to implement?
- What bandwidth does your application require?
- Does your design require Configuration via Protocol (CvP)?
- Select parameters for that variant.
- For Intel® Arria® 10 devices, you can use the new Example Design tab of the component GUI to generate a design that you specify. Then, you can simulate this example and also download it to an Intel® Arria® 10 FPGA Development Kit. Refer to the Intel® Arria® 10/ Intel® Cyclone® 10 GX PCI Express* IP Core Quick Start Guide for details.
-
For all devices, you can simulate using an Intel-provided example design. All static PCI Express example designs
are available under
<install_dir>/ip/altera/altera_pcie/altera_pcie_<dev>_ed/example_design/<dev>
. Alternatively, create a simulation model and use
your own custom or third-party BFM. The Platform Designer
Generate menu generates simulation models. Intel supports
ModelSim* - Intel FPGA Edition for all IP. The PCIe cores support the
Aldec RivieraPro*, Cadence NCSim*, Mentor Graphics ModelSim*, and Synopsys VCS* and VCS-MX* simulators.
The Intel testbench and Root Port or Endpoint BFM provide a simple method to do basic testing of the Application Layer logic that interfaces to the variation. However, the testbench and Root Port BFM are not intended to be a substitute for a full verification environment. To thoroughly test your application, Intel suggests that you obtain commercially available PCI Express verification IP and tools, or do your own extensive hardware testing, or both.
- Compile your design using the Quartus® Prime software. If the versions of your design and the Quartus® Prime software you are running do not match, regenerate your PCIe design.
- Download your design to an Intel development board or your own PCB. Click on the All Development Kits link below for a list of Intel's development boards.
- Test the hardware. You can use Intel's Signal Tap Logic Analyzer or a third-party protocol analyzer to observe behavior.
- Substitute your Application Layer logic for the Application Layer logic in Intel's testbench. Then repeat Steps 3–6. In Intel's testbenches, the PCIe core is typically called the DUT (device under test). The Application Layer logic is typically called APPS.
2. Quick Start Guide
The Intel® Arria® 10 or Intel® Cyclone® 10 GX Hard IP for PCI Express* IP core includes a programmed I/O (PIO) design example to help you understand usage. The PIO example transfers data from a host processor to a target device. It is appropriate for low-bandwidth applications. The design example includes an Avalon-ST to Avalon-MM Bridge. This component translates the TLPs received on the PCIe* link to Avalon-MM memory reads and writes to the on-chip memory.
This design example automatically creates the files necessary to simulate and compile in the Quartus® Prime software. You can download the compiled design to the Intel® Arria® 10 GX FPGA Development Kit. The design examples cover a wide range of parameters. However, the automatically generated design examples do not cover all possible parameterizations of the PCIe IP Core. If you select an unsupported parameter set, generations fails and provides an error message.
In addition, many static design examples for simulation are only available in the <install_dir>/ip/altera/altera_pcie/altera_pcie_a10_ed/example_design/a10 and <install_dir>/ip/altera/altera_pcie/altera_pcie_a10_ed/example_design/c10 directories.
2.1. Directory Structure
2.2. Design Components
2.3. Generating the Design
Follow these steps to generate the design from the IP Parameter Editor:
- In the IP Catalog (Tools > IP Catalog) locate and select the Intel® Arria® 10/Cyclone 10 Hard IP for PCI Express.
- Starting with the Quartus® Prime Pro 16.1 software, the New IP Variation dialog box appears.
- Specify a top-level name and the folder for your custom IP variation, and the target device. Click OK
- On the IP Settings tabs, specify the parameters for your IP variation.
- On the Example Designs tab, the
PIO design is available for your IP variation.
Figure 8. Example Design Tab
- For Example Design Files, select the Simulation and Synthesis options.
- For Generated HDL Format, only Verilog is available.
- For Target
Development Kit select the
Intel®
Arria® 10 FPGA Development Kit
option.Note: Currently, you cannot select an Intel® Cyclone® 10 GX Development Kit when generating an example design.
- Click the Generate Example Design button. The software generates all files necessary to run simulations and hardware tests on the Intel® Arria® 10 FPGA Development Kit. Click Close when generation completes.
- Click Finish.
- The prompt, Recent changes have not been generated. Generate now?, allows you to create files for simulation and synthesis. Click No to continue to simulate the design example you just generated.
2.4. Simulating the Design
- Change to the testbench simulation directory.
- Run the simulation script for the simulator of your choice. Refer to the table below.
- Analyze the results.
Simulator | Working Directory | Instructions |
---|---|---|
ModelSim* | <example_design>/pcie_example_design_tb/pcie_example_design_tb/sim/mentor/ |
|
VCS* | <example_design>/pcie_example_design_tb/pcie_example_design_tb/sim/synopsys/vcs |
|
NCSim* | <example_design>/pcie_example_design_tb/pcie_example_design_tb/sim/cadence |
|
Xcelium* Parallel Simulator | <example_design>/pcie_example_design_tb/pcie_example_design_tb/sim/xcelium |
|

2.5. Compiling and Testing the Design in Hardware

The software application to test the PCI Express Design Example on the Intel® Arria® 10 GX FPGA Development Kit is available on both 32- and 64-bit Windows platforms. This program performs the following tasks:
- Prints the Configuration Space, lane rate, and lane width.
- Writes 0x00000000 to the specified BAR at offset 0x00000000 to initialize the memory and read it back.
- Writes 0xABCD1234 at offset 0x00000000 of the specified BAR. Reads it back and compares.
If successful, the test program displays the message 'PASSED'
Follow these steps to compile the design example in the Quartus Prime software:
- Launch the Quartus Prime software and open the pcie_example_design.qpf file for the example design created above.
- On the Processing > menu,
select Start Compilation.
The timing constraints for the design example and the design components are automatically loaded during compilation.
Follow these steps to test the design example in hardware:
- In the
<example_design>/software/windows/interop
directory, unzip Altera_PCIe_Interop_Test.zip.Note: You can also refer to readme_Altera_PCIe_interop_Test.txt file in this same directory for instructions on running the hardware test.
- Install the
Intel®
FPGA
Windows Demo Driver for PCIe on the Windows host machine, using altera_pcie_win_driver.inf.Note: If you modified the default Vendor ID (0x1172) or Device ID (0x0000) specified in the component parameter editor GUI, you must also modify them in altera_pcie_win_driver.inf.
- In the <example_design> directory, launch the Quartus Prime software and compile the design (Processing > Start Compilation).
- Connect the development board to the host computer.
- Configure the FPGA on the development board using the generated .sof file (Tools > Programmer).
- Open the Windows Device Manager and scan for hardware changes.
- Select the Intel® FPGA listed as an unknown PCI device and point to the appropriate 32- or 64-bit driver (altera_pice_win_driver.inf) in the Windows_driver directory.
- After the driver loads successfully, a new device named Altera PCI API Device appears in the Windows Device Manager.
- Determine the bus, device, and function number for the
Altera PCI API Device listed in
the Windows Device Manager.
- Expand the tab, Altera PCI API Driver under the devices.
- Right click on Altera PCI API Device and select Properties.
- Note the bus, device, and function number for the device. The following figure shows one example.
Figure 13. Determining the Bus, Device, and Function Number for New PCIe Device
- In the <example_design>/software/windows/interop/Altera_PCIe_Interop_Test/Interop_software directory, click Alt_Test.exe.
- When prompted, type the bus, device, and function numbers and
select the BAR number (0-5) you specified when parameterizing the IP core.Note: The bus, device, and function numbers for your hardware setup may be different.
- The test displays the message, PASSED, if the test is successful.
3. Intel Arria 10 or Intel Cyclone 10 GX Parameter Settings
3.1. Parameters
Parameter |
Value |
Description |
---|---|---|
Design Environment |
Standalone
System |
Identifies the environment that the IP is in.
|
Parameter |
Value |
Description |
---|---|---|
Application Interface Type |
Avalon-ST
Avalon-MM Avalon-MM with DMA Avalon-ST with SR-IOV |
Selects the interface to the Application Layer. Note: When the Design
Environment parameter is set to System, all four Application Interface Types are
available. However, when Design
Environment is set to Standalone, only Avalon-ST and Avalon-ST with SR-IOV are available.
|
Hard IP mode |
Gen3x8, Interface: 256-bit, 250 MHz Gen3x4, Interface: 256-bit, 125 MHz Gen3x4, Interface: 128-bit, 250 MHz Gen3x2, Interface: 128-bit, 125 MHz Gen3x2, Interface: 64-bit, 250 MHz Gen3x1, Interface: 64-bit, 125 MHz Gen2x8, Interface: 256-bit, 125 MHz Gen2x8, Interface: 128-bit, 250 MHz Gen2x4, Interface: 128-bit, 125 MHz Gen2x2, Interface: 64-bit, 125 MHz Gen2x4, Interface: 64-bit, 250 MHz Gen2x1, Interface: 64-bit, 125 MHz Gen1x8, Interface: 128-bit, 125 MHz Gen1x8, Interface: 64-bit, 250 MHz Gen1x4, Interface: 64-bit, 125 MHz Gen1x2, Interface: 64-bit, 125 MHz Gen1x1, Interface: 64-bit, 125 MHz Gen1x1, Interface: 64-bit, 62.5 MHz |
Selects the following elements:
Intel® Cyclone® 10 GX devices support up to Gen2 x4 configurations. |
Port type |
Native Endpoint Root Port |
Specifies the port type. The Endpoint stores parameters in the Type 0 Configuration Space. The Root Port stores parameters in the Type 1 Configuration Space. You can enable the Root Port in the current release. Root Port mode only supports the Avalon® -MM interface type, and it only supports basic simulation and compilation. However, the Root Port mode is not fully verified. |
RX Buffer credit allocation -performance for received requests |
Minimum Low Balanced |
Determines the allocation of posted header credits, posted data credits, non-posted header credits, completion header credits, and completion data credits in the 16 KB RX buffer. The settings allow you to adjust the credit allocation to optimize your system. The credit allocation for the selected setting displays in the Message pane. The Message pane dynamically updates the number of credits for Posted, Non-Posted Headers and Data, and Completion Headers and Data as you change this selection. Refer to the Throughput Optimization chapter for more information about optimizing your design. Refer to the RX Buffer Allocation Selections Available by Interface Type below for the availability of these settings by interface type. Minimum—configures the minimum PCIe specification allowed for non-posted and posted request credits, leaving most of the RX Buffer space for received completion header and data. Select this option for variations where application logic generates many read requests and only infrequently receives single requests from the PCIe link. Low—configures a slightly larger amount of RX Buffer space for non-posted and posted request credits, but still dedicates most of the space for received completion header and data. Select this option for variations where application logic generates many read requests and infrequently receives small bursts of requests from the PCIe link. This option is recommended for typical endpoint applications where most of the PCIe traffic is generated by a DMA engine that is located in the endpoint application layer logic. Balanced—configures approximately half the RX Buffer space to received requests and the other half of the RX Buffer space to received completions. Select this option for variations where the received requests and received completions are roughly equal. |
RX Buffer completion credits |
Header credits Data credits |
Displays the number of completion credits in the 16 KB RX buffer resulting from the credit allocation parameter. Each header credit is 16 bytes. Each data credit is 20 bytes. |
3.2. Intel Arria 10 or Intel Cyclone 10 GX Avalon-ST Settings
Parameter |
Value |
Description |
---|---|---|
Enable Avalon-ST reset output port | On/Off |
When On, the generated reset output port has the same functionality that the reset_status port included in the Reset and Link Status interface. |
Enable byte parity ports on Avalon-ST interface |
On/Off |
When On, the RX and TX datapaths are parity protected. Parity is odd. The Application Layer must provide valid byte parity in the Avalon-ST TX direction. This parameter is only available for the Avalon‑ST Intel® Arria® 10 or Intel® Cyclone® 10 GX Hard IP for PCI Express. |
Enable multiple packets per cycle for the 256-bit interface |
On/Off |
When On, the 256‑bit Avalon‑ST interface supports the transmission of TLPs starting at any 128‑bit address boundary, allowing support for multiple packets in a single cycle. To support multiple packets per cycle, the Avalon‑ST interface includes 2 start of packet and end of packet signals for the 256‑bit Avalon‑ST interfaces. This is not supported for the Avalon-ST with SR-IOV interface. |
Enable credit consumed selection port |
On/Off |
When you turn on this option, the core includes the tx_cons_cred_sel port. This parameter does not apply to the Avalon-MM interface. |
Enable Configuration bypass (CfgBP) |
On/Off |
When On, the Intel® Arria® 10 or Intel® Cyclone® 10 GX Hard IP for PCI Express bypasses the Transaction Layer Configuration Space registers included as part of the Hard IP, allowing you to substitute a custom Configuration Space implemented in soft logic. This parameter is not available for the Avalon‑MM IP Cores. |
Enable local management interface (LMI) |
On/Off |
When On, your variant includes the optional LMI interface. This interface is used to log error descriptor information in the TLP header log registers. The LMI interface provides the same access to Configuration Space registers as Configuration TLP requests. |
3.3. Base Address Register (BAR) and Expansion ROM Settings
The type and size of BARs available depend on port type.
Parameter |
Value |
Description |
---|---|---|
Type |
Disabled 64-bit prefetchable memory 32-bit non-prefetchable memory 32-bit prefetchable memory I/O address space |
If you select 64-bit prefetchable memory, 2 contiguous BARs are combined to form a 64-bit prefetchable BAR; you must set the higher numbered BAR to Disabled. A non-prefetchable 64‑bit BAR is not supported because in a typical system, the Root Port Type 1 Configuration Space sets the maximum non‑prefetchable memory window to 32 bits. The BARs can also be configured as separate 32‑bit memories. Defining memory as prefetchable allows contiguous data to be fetched ahead. Prefetching memory is advantageous when the requestor may require more data from the same region than was originally requested. If you specify that a memory is prefetchable, it must have the following 2 attributes:
|
Size |
16 Bytes–8 EB |
Supports the following memory sizes:
|
Expansion ROM |
Disabled–16 MB |
Specifies the size of the optional ROM. The expansion ROM is only available for the Avalon‑ST interface. |
3.4. Base and Limit Registers for Root Ports
Parameter |
Value |
Description |
---|---|---|
Input/Output |
Disabled 16-bit I/O addressing 32-bit I/O addressing |
Specifies the address widths for the IO base and IO limit registers. |
Prefetchable memory |
Disabled 16-bit memory addressing 32-bit memory addressing |
Specifies the address widths for the Prefetchable Memory Base register and Prefetchable Memory Limit register. |
3.5. Device Identification Registers
Register Name |
Default Value |
Description |
---|---|---|
Vendor ID |
0x00001172 |
Sets the read-only value of the Vendor ID register. This parameter can not be set to 0xFFFF per the PCI Express Specification. Address offset: 0x000. |
Device ID |
Custom value |
Sets the read-only value of the Device ID register. Address offset: 0x000. |
Revision ID |
Custom value |
Sets the read-only value of the Revision ID register. Address offset: 0x008. |
Class code |
Custom value
|
Sets the read-only value of the Class Code register. Note: The 24-bit Class Code register
is further divided into three 8-bit fields: Base Class
Code, Sub-Class Code and
Programming Interface. For more details on
these fields, refer to the PCI Express Base
Specification.
Address offset: 0x008. |
Subsystem Vendor ID |
Custom value |
Sets the read-only value of the ` register in the PCI Type 0 Configuration Space. This parameter cannot be set to 0xFFFF per the PCI Express Base Specification. This value is assigned by PCI-SIG to the device manufacturer. Address offset: 0x02C. |
Subsystem Device ID |
Custom value
|
Sets the read-only value of the Subsystem Device ID register in the PCI Type 0 Configuration Space. Address offset: 0x02C |
3.6. PCI Express and PCI Capabilities Parameters
This group of parameters defines various capability properties of the IP core. Some of these parameters are stored in the PCI Configuration Space - PCI Compatible Configuration Space. The byte offset indicates the parameter address.
3.6.1. PCI Express and PCI Capabilities
Parameter |
Possible Values |
Default Value |
Description |
---|---|---|---|
Maximum payload size |
128 bytes 256 bytes 512 bytes 1024 bytes 2048 bytes |
128 bytes |
Specifies the maximum payload size supported. This parameter sets the read-only value of the max payload size supported field of the Device Capabilities register (0x084[2:0]). Address: 0x084. |
Number of Tags supported |
32 64 |
32 |
Indicates the number of tags supported for non-posted requests transmitted by the Application Layer. This parameter sets the values in the Device Control register (0x088) of the PCI Express capability structure described in Table 9–9 on page 9–5. The Transaction Layer tracks all outstanding completions for non‑posted requests made by the Application Layer. This parameter configures the Transaction Layer for the maximum number of Tags supported to track. The Application Layer must set the tag values in all non‑posted PCI Express headers to be less than this value. Values greater than 32 also set the extended tag field supported bit in the Configuration Space Device Capabilities register. The Application Layer can only use tag numbers greater than 31 if configuration software sets the Extended Tag Field Enable bit of the Device Control register. This bit is available to the Application Layer on the tl_cfg_ctl output signal as cfg_devcsr[8]. |
Completion timeout range |
ABCD BCD ABC AB B A None |
ABCD |
Indicates device function support for the optional completion timeout programmability mechanism. This mechanism allows system software to modify the completion timeout value. This field is applicable only to Root Ports and Endpoints that issue requests on their own behalf. Completion timeouts are specified and enabled in the Device Control 2 register (0x0A8) of the PCI Express Capability Structure Version. For all other functions this field is reserved and must be hardwired to 0x0000b. Four time value ranges are defined:
Bits are set to show timeout value ranges supported. The function must implement a timeout value in the range 50 s to 50 ms. The following values specify the range:
All other values are reserved. Intel recommends that the completion timeout mechanism expire in no less than 10 ms. |
Disable completion timeout |
On/Off |
On |
Disables the completion timeout mechanism. When On, the core supports the completion timeout disable mechanism via the PCI Express Device Control Register 2. The Application Layer logic must implement the actual completion timeout mechanism for the required ranges. |
3.6.2. Error Reporting
Parameter |
Value |
Default Value |
Description |
---|---|---|---|
Enable Advanced Error Reporting (AER) |
On/Off |
Off |
When On, enables the Advanced Error Reporting (AER) capability. |
Enable ECRC checking |
On/Off |
Off |
When On, enables ECRC checking. Sets the read-only value of the ECRC check capable bit in the Advanced Error Capabilities and Control Register. This parameter requires you to enable the AER capability. |
Enable ECRC generation |
On/Off |
Off |
When On, enables ECRC generation capability. Sets the read-only value of the ECRC generation capable bit in the Advanced Error Capabilities and Control Register. This parameter requires you to enable the AER capability. |
Enable ECRC forwarding on the Avalon-ST interface |
On/Off |
Off |
When On, enables ECRC forwarding to the Application Layer. On the Avalon‑ST RX path, the incoming TLP contains the ECRC dword (1) and the TD bit is set if an ECRC exists. On the transmit the TLP from the Application Layer must contain the ECRC dword and have the TD bit set. |
Track RX completion buffer overflow on the Avalon-ST interface |
On/Off |
Off |
When On, the core includes the rxfc_cplbuf_ovf output status signal to track the RX posted completion buffer overflow status. |
Note:
|
3.6.3. Link Capabilities
Parameter |
Value |
Description |
---|---|---|
Link port number (Root Port only) |
0x01 |
Sets the read-only value of the port number field in the Link Capabilities register. This parameter is for Root Ports only. It should not be changed. |
Data link layer active reporting (Root Port only) |
On/Off |
Turn On this parameter for a Root Port, if the attached Endpoint supports the optional capability of reporting the DL_Active state of the Data Link Control and Management State Machine. For a hot-plug capable Endpoint (as indicated by the Hot Plug Capable field of the Slot Capabilities register), this parameter must be turned On. For Root Port components that do not support this optional capability, turn Off this option. |
Surprise down reporting (Root Port only) |
On/Off |
When your turn this option On, an Endpoint supports the optional capability of detecting and reporting the surprise down error condition. The error condition is read from the Root Port. |
Slot clock configuration |
On/Off |
When you turn this option On, indicates that the Endpoint or Root Port uses the same physical reference clock that the system provides on the connector. When Off, the IP core uses an independent clock regardless of the presence of a reference clock on the connector. This parameter sets the Slot Clock Configuration bit (bit 12) in the PCI Express Link Status register. |
3.6.4. MSI and MSI-X Capabilities
Parameter |
Value |
Description |
---|---|---|
MSI messages requested |
1, 2, 4, 8, 16, 32 |
Specifies the number of messages the Application Layer can request. Sets the value of the Multiple Message Capable field of the Message Control register, Address: 0x050[31:16]. |
MSI-X Capabilities | ||
Implement MSI-X |
On/Off |
When On, adds the MSI-X functionality. |
Bit Range | ||
Table size |
[10:0] |
System software reads this field to determine the MSI-X Table size <n>, which is encoded as <n–1>. For example, a returned value of 2047 indicates a table size of 2048. This field is read-only in the MSI-X Capability Structure. Legal range is 0–2047 (211). Address offset: 0x068[26:16] |
Table offset |
[31:0] |
Points to the base of the MSI-X Table. The lower 3 bits of the table BAR indicator (BIR) are set to zero by software to form a 64-bit qword-aligned offset. This field is read-only. |
Table BAR indicator |
[2:0] |
Specifies which one of a function’s BARs, located beginning at 0x10 in Configuration Space, is used to map the MSI-X table into memory space. This field is read-only. Legal range is 0–5. |
Pending bit array (PBA) offset |
[31:0] |
Used as an offset from the address contained in one of the function’s Base Address registers to point to the base of the MSI-X PBA. The lower 3 bits of the PBA BIR are set to zero by software to form a 32-bit qword-aligned offset. This field is read-only in the MSI-X Capability Structure. 4 |
Pending BAR indicator |
[2:0] |
Specifies the function Base Address registers, located beginning at 0x10 in Configuration Space, that maps the MSI-X PBA into memory space. This field is read-only in the MSI-X Capability Structure. Legal range is 0–5. |
3.6.5. Slot Capabilities
Parameter |
Value |
Description |
---|---|---|
Use Slot register |
On/Off |
This parameter is only supported in Root Port mode. The slot capability is required for Root Ports if a slot is implemented on the port. Slot status is recorded in the PCI Express Capabilities register. Defines the characteristics of the slot. You turn on this option by selecting Enable slot capability. Refer to the figure below for bit definitions. |
Slot power scale |
0–3 |
Specifies the scale used for the Slot power limit. The following coefficients are defined:
The default value prior to hardware and firmware initialization is b’00. Writes to this register also cause the port to send the Set_Slot_Power_Limit Message. Refer to Section 6.9 of the PCI Express Base Specification Revision for more information. |
Slot power limit |
0–255 |
In combination with the Slot power scale value, specifies the upper limit in watts on power supplied by the slot. Refer to Section 7.8.9 of the PCI Express Base Specification for more information. |
Slot number |
0-8191 |
Specifies the slot number. |
3.6.6. Power Management
Parameter |
Value |
Description |
---|---|---|
Endpoint L0s acceptable latency |
Maximum of 64 ns Maximum of 128 ns Maximum of 256 ns Maximum of 512 ns Maximum of 1 us Maximum of 2 us Maximum of 4 us No limit |
This design parameter specifies the maximum acceptable latency that the device can tolerate to exit the L0s state for any links between the device and the root complex. It sets the read-only value of the Endpoint L0s acceptable latency field of the Device Capabilities Register (0x084). This Endpoint does not support the L0s or L1 states. However, in a switched system there may be links connected to switches that have L0s and L1 enabled. This parameter is set to allow system configuration software to read the acceptable latencies for all devices in the system and the exit latencies for each link to determine which links can enable Active State Power Management (ASPM). This setting is disabled for Root Ports. The default value of this parameter is 64 ns. This is a safe setting for most designs. |
Endpoint L1 acceptable latency |
Maximum of 1 us Maximum of 2 us Maximum of 4 us Maximum of 8 us Maximum of 16 us Maximum of 32 us Maximum of 64 nsNo limit |
This value indicates the acceptable latency that an Endpoint can withstand in the transition from the L1 to L0 state. It is an indirect measure of the Endpoint’s internal buffering. It sets the read-only value of the Endpoint L1 acceptable latency field of the Device Capabilities Register. This Endpoint does not support the L0s or L1 states. However, a switched system may include links connected to switches that have L0s and L1 enabled. This parameter is set to allow system configuration software to read the acceptable latencies for all devices in the system and the exit latencies for each link to determine which links can enable Active State Power Management (ASPM). This setting is disabled for Root Ports. The default value of this parameter is 1 µs. This is a safe setting for most designs. |
These IP cores also do not support the in-band beacon or sideband WAKE# signal, which are mechanisms to signal a wake-up event to the upstream device.
3.7. Vendor Specific Extended Capability (VSEC)
Parameter |
Value |
Description |
---|---|---|
Vendor Specific Extended Capability (VSEC) ID: | 0x00001172 | Sets the read-only value of the 16-bit User ID register from the Vendor Specific Extended Capability. |
Vendor Specific Extended Capability (VSEC) Revision: | 0x00000000 | Sets the read-only value of the 4-bit VSEC Revision register from the Vendor Specific Extended Capability. |
User Device or Board Type ID register from the Vendor Specific Extended Capability: | 0x00000000 | Sets the read-only value of the 16-bit Device or Board Type ID register from the Vendor Specific Extended Capability. |
3.8. Configuration, Debug, and Extension Options
Parameter |
Value |
Description |
---|---|---|
Enable configuration via Protocol (CvP) |
On/Off |
When On, the Quartus® Prime software places the Endpoint in the location required for configuration via protocol (CvP). For more information about CvP, click the Configuration via Protocol (CvP) link below. CvP is supported for Intel® Cyclone® 10 GX devices from the Intel® Quartus® Prime release 17.1.1 onwards. |
Enable dynamic reconfiguration of PCIe read-only registers |
On/Off |
When On, you can use the Hard IP reconfiguration bus to dynamically reconfigure Hard IP read‑only registers. For more information refer to Hard IP Reconfiguration Interface. |
Enable transceiver dynamic reconfiguration | On/Off | When on, creates an Avalon-MM slave interface that software can drive to update transceiver registers. |
Enable Altera Debug Master Endpoint (ADME) |
On/Off |
When On, an embedded Altera Debug Master Endpoint connects internally to the Avalon-MM slave interface for dynamic reconfiguration. The ADME can access the reconfiguration space of the transceiver. It uses JTAG via the System Console to run tests and debug functions. |
Enable Intel® Arria® 10 FPGA Development Kit connection | On/Off | When On, add control and status conduit interface to the top level variant, to be connected a PCIe Development Kit component. |
3.9. PHY Characteristics
Parameter |
Value |
Description |
---|---|---|
Gen2 TX de-emphasis |
3.5dB 6dB |
Specifies the transmit de-emphasis for Gen2. Intel recommends the following settings:
|
Requested equalization far-end TX preset | Preset0-Preset9 | Specifies the requested TX preset for Phase 2 and 3 far-end transmitter. The default value Preset8 provides the best signal quality for most designs. |
Enable soft DFE controller IP |
On Off |
When On, the PCIe Hard IP core includes a decision feedback equalization (DFE) soft controller in the FPGA fabric to improve the bit error rate (BER) margin. The default for this option is Off because the DFE controller is typically not required. However, short reflective links may benefit from this soft DFE controller IP. This parameter is available only for Gen3 mode. It is not supported when CvP or autonomous modes are enabled. |
Enable RX-polarity inversion in soft logic |
On Off |
This parameter mitigates the following RX-polarity inversion problem. When the Intel® Arria® 10 or Intel® Cyclone® 10 GX Hard IP core receives TS2 training sequences during the Polling.Config state, when you have not enabled this parameter, automatic lane polarity inversion is not guaranteed. The link may train to a smaller than expected link width or may not train successfully. This problem can affect configurations with any PCIe* speed and width. When you include this parameter, polarity inversion is available for all configurations except Gen1 x1. This fix does not support CvP or autonomous mode. |
3.10. Example Designs
Parameter |
Value |
Description |
---|---|---|
Available Example Designs |
PIO |
When you select the PIO option, the generated design includes a target application including only downstream transactions. The PIO design example is the only option for the Avalon® -ST interface. |
Simulation | On/Off | When On, the generated output includes a simulation model. |
Synthesis | On/Off | When On, the generated output includes a synthesis model. |
Generated HDL format |
Verilog/VHDL |
Verilog HDL and VHDL are supported |
Select Board |
Intel® Arria® 10 FPGA GX Development Kit Intel® Arria® 10 FPGA GX Development Kit ES2 None |
Specifies the Intel® Arria® 10 development kit. Select None to download to a custom board. Note: Currently, you cannot target an
Intel®
Cyclone® 10 GX
Development Kit when generating an example design.
|
4. Physical Layout
4.1. Hard IP Block Placement In Intel Cyclone 10 GX Devices
Refer to the Intel® Cyclone® 10 GX Device Transceiver Layout in the Intel® Cyclone® 10 GX Transceiver PHY User Guide for comprehensive figures for Intel® Cyclone® 10 GX devices.
4.2. Hard IP Block Placement In Intel Arria 10 Devices
Refer to the Intel® Arria® 10 Transceiver Layout in the Intel® Arria® 10 for comprehensive figures for Intel® Arria® 10 GT, GX, and SX devices.
4.3. Channel and Pin Placement for the Gen1, Gen2, and Gen3 Data Rates
In these figures, channels that are not used for the PCI Express protocol are available for other protocols. Unused channels are shown in gray.
For the possible values of <txvr_block_N> and <txvr_block_N+1>, refer to the figures that show the physical location of the Hard IP PCIe blocks in the different types of Intel® Arria® 10 or Intel® Cyclone® 10 GX devices, at the start of this chapter. For each hard IP block, the transceiver block that is adjacent and extends below the hard IP block, is <txvr_block_N>, and the transceiver block that is directly above is <txvr_block_N+1> . For example, in an Intel® Arria® 10 device with 96 transceiver channels and four PCIe hard IP blocks, if your design uses the hard IP block that supports CvP, <txvr_block_N> is GXB1C and <txvr_block_N+1> is GXB1D.
4.4. Channel Placement and fPLL and ATX PLL Usage for the Gen3 Data Rate
Gen3 variants must initially train at the Gen1 data rate. Consequently, Gen3 variants require an fPLL to generate the 2.5 and 5.0 Gbps clocks, and an ATX PLL to generate the 8.0 Gbps clock. In these figures, channels that are not used for the PCI Express protocol are available for other protocols. Unused channels are shown in gray.
4.5. PCI Express Gen3 Bank Usage Restrictions
Any transceiver channels that share a bank with active PCI Express interfaces that are Gen3 capable have the following restrictions. This includes both Hard IP and Soft IP implementations:
- When VCCR_GXB and VCCT_GXB are set to 1.03 V or 1.12 V, the maximum data rate supported for the non-PCIe channels in those banks is 12.5 Gbps for chip-to-chip applications. These channels cannot be used to drive backplanes or for GT rates.
PCI Express interfaces that are only Gen1 or Gen2 capable are not affected.
Status
Affects all Intel® Arria® 10 ES and production devices. No fix is planned.
5. Interfaces and Signal Descriptions
5.1. Avalon‑ST RX Interface
The following table describes the signals that comprise the Avalon-ST RX Datapath. The RX data signal can be 64, 128, or 256 bits.
Signal |
Direction |
Description |
---|---|---|
rx_st_data[<n>-1:0] |
Output |
Receive data bus. Refer to figures following this table for the mapping of the Transaction Layer’s TLP information to rx_st_data and examples of the timing of this interface. Note that the position of the first payload dword depends on whether the TLP address is qword aligned. The mapping of message TLPs is the same as the mapping of TLPs with 4‑dword headers. When using a 64-bit Avalon-ST bus, the width of rx_st_data is 64. When using a 128-bit Avalon-ST bus, the width of rx_st_data is 128. When using a 256‑bit Avalon‑ST bus, the width of rx_st_data is 256 bits. |
rx_st_sop[1:0] |
Output |
Indicates that this is the first cycle of the TLP when rx_st_valid is asserted. When using a 256-bit Avalon-ST bus the following correspondences apply: When you turn on Enable multiple packets per cycle,
In single packet per cycle mode, this signal is a single bit which indicates that a TLP begins in this cycle. |
rx_st_eop[1:0] |
Output |
Indicates that this is the last cycle of the TLP when rx_st_valid is asserted. When using a 256-bit Avalon-ST bus the following correspondences apply: When you turn on Enable multiple packets per cycle,
In single packet per cycle mode, this signal is a single bit which indicates that a TLP ends in this cycle. |
rx_st_empty[1:0] |
Output |
Indicates the number of empty qwords in rx_st_data. Not used when rx_st_data is 64 bits. Valid only when rx_st_eop is asserted in 128-bit and 256‑bit modes. For 128‑bit data, only bit 0 applies; this bit indicates whether the upper qword contains data. For 256‑bit data single packet per cycle mode, both bits are used to indicate whether 0-3 upper qwords contain data, resulting in the following encodings for the 128‑and 256-bit interfaces:
|
rx_st_ready |
Input |
Indicates that the Application Layer is ready to accept data. The Application Layer deasserts this signal to throttle the data stream. If rx_st_ready is asserted by the Application Layer on cycle <n> , then <n + > readyLatency > is a ready cycle, during which the Transaction Layer may assert valid and transfer data. The RX interface supports a readyLatency of 3 cycles. |
rx_st_valid |
Output |
Clocks rx_st_data into the Application Layer. Deasserts within 2 clocks of rx_st_ready deassertion and reasserts within 2 clocks of rx_st_ready assertion if more data is available to send. For 256-bit data, when you turn on Enable multiple packets per cycle, bit 0 applies to the entire bus rx_st_data[255:0]. Bit 1 is not used. |
rx_st_err[<n>-1:0] |
Output |
Indicates that there is an uncorrectable error correction coding (ECC) error in the internal RX buffer. Active when ECC is enabled. ECC is automatically enabled by the Quartus® Prime assembler. ECC corrects single‑bit errors and detects double‑bit errors on a per byte basis. When an uncorrectable ECC error is detected, rx_st_err is asserted for at least 1 cycle while rx_st_valid is asserted. For 256-bit data, when you turn on Enable multiple packets per cycle, bit 0 applies to the entire bus rx_st_data[255:0]. Bit 1 is not used. Intel recommends resetting the Intel® Arria® 10 or Intel® Cyclone® 10 GX Hard IP for PCI Express when an uncorrectable double‑bit ECC error is detected. |
pcie_a10.pcie_a10_hip_0.tx.st Interface must have an associated reset pcie_a10.pcie_a10_hip_0.rx.st Interface must have an associated resetYou can safely ignore these warnings because the IP core has a dedicated hard reset pin that is not part of the Avalon-ST TX or RX interface.
5.1.1. Avalon-ST RX Component Specific Signals
Signal |
Direction |
Description |
---|---|---|
rx_st_mask | Input |
The Application Layer asserts this signal to tell the Hard IP to stop sending non-posted requests. This signal can be asserted at any time. The total number of non‑posted requests that can be transferred to the Application Layer after rx_st_mask is asserted is not more than 10. This signal stalls only non-posted TLPs. All others continue to be forwarded to the Application Layer. The stalled non-posted TLPs are held in the RX buffer until the mask signal is deasserted. They are not be discarded. If used in a Root Port mode, asserting the rx_st_mask signal stops all I/O and MemRd and configuration accesses because these are all non-posted transactions. |
rx_st_bar[7:0] |
Output |
The decoded BAR bits for the TLP. Valid for MRd, MWr, IOWR, and IORD TLPs. Ignored for the completion or message TLPs. Valid during the cycle in which rx_st_sop is asserted. Refer to 64-Bit Avalon-ST rx_st_data<n> Cycle Definitions for 4-Dword Header TLPs with Non-Qword Addresses and 128-Bit Avalon-ST rx_st_data<n> Cycle Definition for 3-Dword Header TLPs with Qword Aligned Addresses for the timing of this signal for 64- and 128-bit data, respectively. The following encodings are defined for Endpoints:
The following encodings are defined for Root Ports:
For multiple packets per cycle, this signal is undefined. If you turn on Enable multiple packets per cycle, do not use this signal to identify the address BAR hit. |
rx_st_parity[<n>-1:0] |
Output |
The IP core generates byte parity when you turn on Enable byte parity ports on Avalon-ST interface on the System Settings tab of the parameter editor. Each bit represents odd parity of the associated byte of the rx_st_datarx_st_data bus. For example, bit[0] corresponds to rx_st_data[7:0] rx_st_data[7:0], bit[1] corresponds to rx_st_data[15:8]. |
rxfc_cplbuf_ovf] |
Output |
When asserted indicates that the internal RX buffer has overflowed. |
For more information about the Avalon-ST protocol, refer to the Avalon Interface Specifications.
5.1.2. Data Alignment and Timing for the 64‑Bit Avalon -ST RX Interface
To facilitate the interface to 64-bit memories, the Intel® Arria® 10 or Intel® Cyclone® 10 GX Hard IP for PCI Express aligns data to the qword or 64 bits by default. Consequently, if the header presents an address that is not qword aligned, the Hard IP block shifts the data within the qword to achieve the correct alignment.
Qword alignment applies to all types of request TLPs with data, including the following TLPs:
- Memory writes
- Configuration writes
- I/O writes
The alignment of the request TLP depends on bit 2 of the request address. For completion TLPs with data, alignment depends on bit 2 of the lower address field. This bit is always 0 (aligned to qword boundary) for completion with data TLPs that are for configuration read or I/O read requests.
The following table shows the byte ordering for header and data packets.
Packet |
TLP |
---|---|
Header0 |
pcie_hdr_byte0, pcie_hdr _byte1, pcie_hdr _byte2, pcie_hdr _byte3 |
Header1 |
pcie_hdr _byte4, pcie_hdr _byte5, pcie_hdr byte6, pcie_hdr _byte7 |
Header2 |
pcie_hdr _byte8, pcie_hdr _byte9, pcie_hdr _byte10, pcie_hdr _byte11 |
Header3 |
pcie_hdr _byte12, pcie_hdr _byte13, header_byte14, pcie_hdr _byte15 |
Data0 |
pcie_data_byte3, pcie_data_byte2, pcie_data_byte1, pcie_data_byte0 |
Data1 |
pcie_data_byte7, pcie_data_byte6, pcie_data_byte5, pcie_data_byte4 |
Data2 |
pcie_data_byte11, pcie_data_byte10, pcie_data_byte9, pcie_data_byte8 |
Data<n> |
pcie_data_byte<4n+3>, pcie_data_byte<4n+2>, pcie_data_byte<4n+1>, pcie_data_byte<n> |
The following figure illustrates the mapping of Avalon‑ST RX packets to PCI Express TLPs for a three dword header with non-qword aligned addresses with a 64-bit bus. In this example, the byte address is unaligned and ends with 0x4, causing the first data to correspond to rx_st_data[63:32] .
The following figure illustrates the mapping of Avalon-ST RX packets to PCI Express TLPs for a three dword header with qword aligned addresses. Note that the byte enables indicate the first byte of data is not valid and the last dword of data has a single valid byte.
The following figure illustrates back‑to‑back transmission on the 64‑bit Avalon‑ST RX interface with no idle cycles between the assertion of rx_st_eop and rx_st_sop.
5.1.3. Data Alignment and Timing for the 128‑Bit Avalon‑ST RX Interface
The following figure shows the mapping of 128-bit Avalon-ST RX packets to PCI Express TLPs for TLPs with a three dword header and qword aligned addresses. The assertion of rx_st_empty in a rx_st_eop cycle, indicates valid data on the lower 64 bits of rx_st _data.
The following figure shows the mapping of 128-bit Avalon-ST RX packets to PCI Express TLPs for TLPs with a 3 dword header and non-qword aligned addresses. In this case, bits[127:96] represent Data0 because address[2] in the TLP header is set. The assertion of rx_st_empty in a rx_st_eop cycle indicates valid data on the lower 64 bits of rx_st_data.
The following figure shows the mapping of 128-bit Avalon-ST RX packets to PCI Express TLPs for a four dword header with non-qword aligned addresses. In this example, rx_st_empty is low because the data is valid for all 128 bits in the rx_st_eop cycle.
The following figure shows the mapping of 128-bit Avalon-ST RX packets to PCI Express TLPs for a four dword header with qword aligned addresses. In this example, rx_st_empty is low because data is valid for all 128-bits in the rx_st_eop cycle.
The following figure illustrates the timing of the RX interface when the Application Layer backpressures the Hard IP by deasserting rx_st_ready. The rx_st_valid signal deasserts within three cycles after rx_st_ready is deasserted. In this example, rx_st_valid is deasserted in the next cycle. rx_st_data is held until the Application Layer is able to accept it.
The following figure illustrates back‑to‑back transmission on the 128‑bit Avalon‑ST RX interface with no idle cycles between the assertion of rx_st_eop and rx_st_sop.
The following figure illustrates back‑to‑back transmission on the 128‑bit Avalon‑ST RX interface with no idle cycles between the assertion of rx_st_eop and rx_st_sop.
The following figure illustrates a two‑cycle packet with valid data in the lower qword (rx_st_data[63:0]) and a one‑cycle packet where the rx_st_sop and rx_st_eop occur in the same cycle.
5.1.4. Data Alignment and Timing for 256‑Bit Avalon‑ST RX Interface
The following figure shows the location of headers and data for the 256‑bit Avalon‑ST packets. This layout of data applies to both the TX and RX buses.
The following figure illustrates two single-cycle 256‑bit packets. The first packet has two empty dwords, rx_st_data[191:0] is valid. The second packet has four empty dwords; rx_st_data[127:0] is valid.
5.1.5. Tradeoffs to Consider when Enabling Multiple Packets per Cycle
If you enable Multiple Packets Per Cycle under the Systems Settings heading, a TLP can start on a 128‑bit boundary. This mode supports multiple start of packet and end of packet signals in a single cycle when the Avalon‑ST interface is 256 bits wide. It reduces the wasted bandwidth for small packets.
A comparison of the largest and smallest packet sizes illustrates this point. Large packets using the full 256 bits achieve the following throughput:
256/256*8 = 8 GBytes/sec
The smallest PCIe packet, such as a 3-dword memory read, uses 96 bits of the 256-bits bus and achieve the following throughput:
96/256*8 = 3 GBytes/sec
If you enable mMultiple Packets Per Cycle, when a TLP ends in the upper 128 bits of the Avalon‑ST bus, a new TLP can start in the lower 128 bits Consequently, the bandwidth of small packets doubles:
96*2/256*8 = 6 GBytes/sec
This mode adds complexity to the Application Layer user decode logic. However, it could result in higher throughput.
The following figure illustrates this mode for a 256-bit Avalon‑ST RX interface. In this figure rx_st_eop[0] and rx_st_sop[1] are asserted in the same cycle.
5.2. Avalon-ST TX Interface
The following table describes the signals that comprise the Avalon-ST TX Datapath. The TX data signal can be 64, 128, or 256 bits.
Signal |
Direction |
Description |
---|---|---|
tx_st_data[<n>-1:0] |
Input |
Data for transmission. Transmit data bus. Refer to the following sections on data alignment for the 64-, 128-, and 256-bit interfaces for the mapping of TLP packets to tx_st_data and examples of the timing of this interface. When using a 64-bit Avalon-ST bus, the width of tx_st_d ata is 64. When using a 128-bit Avalon-ST bus, the width of tx_st_data is 128 bits. When using a 256‑bit Avalon‑ST bus, the width of tx_st_data is 256 bits. The Application Layer must provide a properly formatted TLP on the TX interface. The mapping of message TLPs is the same as the mapping of Transaction Layer TLPs with 4 dword headers. The number of data cycles must be correct for the length and address fields in the header. Issuing a packet with an incorrect number of data cycles results in the TX interface hanging and becoming unable to accept further requests. <n> = 64, 128, or 256. |
tx_st_sop[<n>-1:0] |
Input |
Indicates first cycle of a TLP when asserted together with tx_st_valid. <n> = 1 or 2. When using a 256-bit Avalon-ST bus with Multiple packets per cycle, bit 0 indicates that a TLP begins in tx_st_data[127:0], bit 1 indicates that a TLP begins in tx_st_data[255:128]. |
tx_st_eop[<n>-1:0] |
Input |
Indicates last cycle of a TLP when asserted together with tx_st_valid. <n> = 1 or 2. When using a 256-bit Avalon-ST bus with Multiple packets per cycle, bit 0 indicates that a TLP ends with tx_st_data[127:0], bit 1 indicates that a TLP ends with tx_st_data[255:128]. |
tx_st_ready |
Output |
Indicates that the Transaction Layer is ready to accept data for transmission. The core deasserts this signal to throttle the data stream. tx_st_ready may be asserted during reset. The Application Layer should wait at least 2 clock cycles after the reset is released before issuing packets on the Avalon‑ST TX interface. The reset_status signal can also be used to monitor when the IP core has come out of reset. If tx_st_ready is asserted by the Transaction Layer on cycle <n> , then <n + readyLatency> is a ready cycle, during which the Application Layer may assert valid and transfer data. When tx_st_ready, tx_st_valid and tx_st_data are registered (the typical case), Intel recommends a readyLatency of 2 cycles to facilitate timing closure; however, a readyLatency of 1 cycle is possible. If no other delays are added to the read‑valid latency, the resulting delay corresponds to a readyLatency of 2. |
tx_st_valid |
Input |
Clocks tx_st_data to the core when tx_st_ready is also asserted. Between tx_st_sop and tx_st_eop, tx_st_valid must not be deasserted in the middle of a TLP except in response to tx_st_ready deassertion. When tx_st_ready deasserts, this signal must deassert within 1 or 2 clock cycles. When tx_st_ready reasserts, and tx_st_data is in mid-TLP, this signal must reassert within 2 cycles. The figure entitled64-Bit Transaction Layer Backpressures the Application Layer illustrates the timing of this signal. For 256-bit data, when you turn on Enable multiple packets per cycle, the bit 0 applies to the entire bus tx_st_data[255:0]. Bit 1 is not used. To facilitate timing closure, Intel recommends that you register both the tx_st_ready and tx_st_valid signals. If no other delays are added to the ready-valid latency, the resulting delay corresponds to a readyLatency of 2. |
tx_st_empty[1:0] |
Input |
Indicates the number of qwords that are empty during cycles that contain the end of a packet. When asserted, the empty dwords are in the high‑order bits. Valid only when tx_st_eop is asserted. Not used when tx_st_data is 64 bits. For 128‑bit data, only bit 0 applies and indicates whether the upper qword contains data. For 256‑bit data, both bits are used to indicate the number of upper words that contain data, resulting in the following encodings for the 128‑and 256-bit interfaces: 128-Bit interface:tx_st_empty = 0, tx_st_data[127:0]contains valid datatx_st_empty = 1, tx_st_data[63:0] contains valid data 256‑bit interface:tx_st_empty = 0, tx_st_data[255:0] contains valid datatx_ st_empty = 1, tx_st_data[191:0] contains valid datatx_st_empty = 2, tx_st_data[127:0] contains valid datatx_st_empty = 3, tx_st_data[63:0] contains valid data For 256-bit data, when you turn on Enable multiple packets per cycle, the following correspondences apply:
When the TLP ends in the lower 128bits, the following equations apply:
When TLP ends in the upper 128bits, the following equations apply:
|
tx_st_err |
Input |
Indicates an error on transmitted TLP. This signal is used to nullify a packet. It should only be applied to posted and completion TLPs with payload. To nullify a packet, assert this signal for 1 cycle after the SOP and before the EOP. When a packet is nullified, the following packet should not be transmitted until the next clock cycle. tx_st_err is not available for packets that are 1 or 2 cycles long. For 256-bit data, when you turn on Enable multiple packets per cycle, bit 0 applies to the entire bus tx_st_data[255:0]. Bit 1 is not used. Refer to the figure entitled 128-Bit Avalon-ST tx_st_data Cycle Definition for 3-Dword Header TLP with non-Qword Aligned Address for a timing diagram that illustrates the use of the error signal. Note that it must be asserted while the valid signal is asserted. |
tx_st_parity[<n>-1:0] |
Input |
Byte parity is generated when you turn on Enable byte parity ports on Avalon ST interface on the System Settings tab of the parameter editor.Each bit represents odd parity of the associated byte of the tx_st_data bus. For example, bit[0] corresponds to tx_st_data[7:0], bit[1] corresponds to tx_st_data[15:8], and so on. <n> = 8, 16, or 32. |
Component Specific Signals |
||
tx_cred_data_fc[11:0] |
Output |
Data credit limit for the credit type specified by tx_cred_fc_sel. Each credit is 16 bytes. There is a latency of two pld_clk clocks between a change on tx_cred_fc_sel and the corresponding data appearing on tx_cred_data_fc and tx_cred_hdr_fc. |
tx_cred_fc_hip_cons[5:0] |
Output |
Asserted for 1 cycle each time the Hard IP consumes a credit. These credits are from messages that the Hard IP for PCIe generates for the following reasons:
This signal is not asserted when an Application Layer credit is consumed. For optimum performance the Application Layer can track of its own consumed credits. (The hard IP also tracks credits and deasserts tx_st_ready if it runs out of credits of any type.) To calculate the total credits consumed, the Application Layer can add its own credits consumed to those consumed by the Hard IP for PCIe. The credit signals are valid after the link is up. The 6 bits of this vector correspond to the following 6 types of credit types:
During a single cycle, the IP core can consume either a single header credit or both a header and a data credit. |
tx_cred_fc_infinite[5:0] |
Output |
When asserted, indicates that the corresponding credit type has infinite credits available and does not need to calculate credit limits. The 6 bits of this vector correspond to the following 6 types of credit types:
|
tx_cred_fc_sel[1:0] |
Input |
Signal to select between which credit type is displayed on the tx_cred_hdr_fc and tx_cred_data_fc outputs. There is a latency of two pld_clk clocks between a change on tx_cred_fc_sel and the corresponding data appearing on tx_cred_data_fc and tx_cred_hdr_fc. The following encoding are defined:
|
tx_cred_hdr_fc[7:0] |
Output |
Header credit limit for the credit type selected by tx_cred_fc_sel. Each credit is 20 bytes. There is a latency of two pld_clk clocks between a change on tx_cred_fc_sel and the corresponding data appearing on tx_cred_data_fc and tx_cred_hdr_fc. |
tx_cons_cred_sel | Input | When 1, the output tx_cred_data* and tx_cred_hdr* signals specify the value of the hard IP internal credits-consumed counter. When 0, tx_cred_data* and tx_cred_hdr* signal specify the limit value. |
ko_cpl_spc_header[7:0] |
Output |
The Application Layer can use this signal to build circuitry to prevent RX buffer overflow for completion headers. Endpoints must advertise infinite space for completion headers; however, RX buffer space is finite. ko_cpl_spc_header is a static signal that indicates the total number of completion headers that can be stored in the RX buffer. |
ko_cpl_spc_data[11:0] |
Output |
The Application Layer can use this signal to build circuitry to prevent RX buffer overflow for completion data. Endpoints must advertise infinite space for completion data; however, RX buffer space is finite. ko_cpl_spc_data is a static signal that reflects the total number of 16 byte completion data units that can be stored in the completion RX buffer. |
5.2.1. Avalon-ST Packets to PCI Express TLPs
The following figures illustrate the mappings between Avalon-ST packets and PCI Express TLPs. These mappings apply to all types of TLPs, including posted, non‑posted, and completion TLPs. Message TLPs use the mappings shown for four dword headers. TLP data is always address-aligned on the Avalon-ST interface whether or not the lower dwords of the header contains a valid address, as may be the case with TLP type (message request with data payload).
For additional information about TLP packet headers, refer to Section 2.2.1 Common Packet Header Fields in the PCI Express Base Specification .
5.2.2. Data Alignment and Timing for the 64‑Bit Avalon-ST TX Interface
The following figure illustrates the mapping between Avalon-ST TX packets and PCI Express TLPs for three dword header TLPs with non-qword aligned addresses on a 64-bit bus.
This figure illustrates the storage of non‑qword aligned data.) Non‑qword aligned address occur when address[2] is set. When address[2] is set, tx_st_data[63:32]contains Data0 and tx_st_data[31:0] contains dword header2. In this figure, the headers are formed by the following bytes:
H0 ={pcie_hdr_byte0, pcie_hdr _byte1, pcie_hdr _byte2, pcie_hdr _byte3} H1 = {pcie_hdr_byte4, pcie_hdr _byte5, header pcie_hdr byte6, pcie_hdr _byte7} H2 = {pcie_hdr _byte8, pcie_hdr _byte9, pcie_hdr _byte10, pcie_hdr _byte11} Data0 = {pcie_data_byte3, pcie_data_byte2, pcie_data_byte1, pcie_data_byte0} Data1 = {pcie_data_byte7, pcie_data_byte6, pcie_data_byte5, pcie_data_byte4} Data2 = {pcie_data_byte11, pcie_data_byte10, pcie_data_byte9, pcie_data_byte8}
The following figure illustrates the mapping between Avalon-ST TX packets and PCI Express TLPs for three dword header TLPs with qword aligned addresses on a 64-bit bus.
The following figure illustrates the mapping between Avalon-ST TX packets and PCI Express TLPs for a four dword header with qword aligned addresses on a 64-bit bus
In this figure, the headers are formed by the following bytes.
H0 = {pcie_hdr_byte0, pcie_hdr _byte1, pcie_hdr _byte2, pcie_hdr _byte3} H1 = {pcie_hdr _byte4, pcie_hdr _byte5, pcie_hdr byte6, pcie_hdr _byte7} H2 = {pcie_hdr _byte8, pcie_hdr _byte9, pcie_hdr _byte10, pcie_hdr _byte11} H3 = pcie_hdr _byte12, pcie_hdr _byte13, header_byte14, pcie_hdr _byte15}, 4 dword header only Data0 = {pcie_data_byte3, pcie_data_byte2, pcie_data_byte1, pcie_data_byte0} Data1 = {pcie_data_byte7, pcie_data_byte6, pcie_data_byte5, pcie_data_byte4}
The following figure illustrates the timing of the TX interface when the Intel® Arria® 10 or Intel® Cyclone® 10 GX Hard IP for PCI Express pauses transmission by the Application Layer by deasserting tx_st_ready. Because the readyLatency is two cycles, the Application Layer deasserts tx_st_valid after two cycles and holds tx_st_data until two cycles after tx_st_ready is asserted.
5.2.3. Data Alignment and Timing for the 128‑Bit Avalon‑ST TX Interface
The following figure shows the mapping of 128-bit Avalon-ST TX packets to PCI Express TLPs for a three dword header with qword aligned addresses. Assertion of tx_st_empty in an rx_st_eop cycle indicates valid data in the lower 64 bits of tx_st_data.
The following figure shows the mapping of 128-bit Avalon-ST TX packets to PCI Express TLPs for a 3 dword header with non-qword aligned addresses. It also shows tx_st_err assertion.
The following figure shows the mapping of 128-bit Avalon-ST TX packets to PCI Express TLPs for a four dword header TLP with non-qword aligned addresses. In this example, tx_st_empty is low because the data ends in the upper 64 bits of tx_st_data.
The following figure illustrates back‑to‑back transmission of 128‑bit packets with idle dead cycles between the assertion of tx_st_eop and tx_st_sop.
The following figure illustrates the timing of the TX interface when the Intel® Arria® 10 or Intel® Cyclone® 10 GX Hard IP for PCI Express pauses the Application Layer by deasserting tx_st_ready. Because the readyLatency is two cycles, the Application Layer deasserts tx_st_valid after two cycles and holds tx_st_data until two cycles after tx_st_ready is reasserted
5.2.4. Data Alignment and Timing for the 256‑Bit Avalon‑ST TX Interface
Refer to Figure 8–16 on page 8–15 layout of headers and data for the 256‑bit Avalon‑ST packets with qword aligned and qword unaligned addresses.
Single Packet Per Cycle
In single packer per cycle mode, all received TLPs start at the lower 128-bit boundary on a 256-bit Avalon-ST interface. Turn on Enable Multiple Packets per Cycle on the System Settings tab of the parameter editor to change multiple packets per cycle.
Single packet per cycle mode requires simpler Application Layer packet decode logic on the TX and RX paths because packets always start in the lower 128-bits of the Avalon-ST interface. Although this mode simplifies the Application Layer logic, failure to use the full 256-bit Avalon-ST may slightly reduce the throughput of a design.
The following figure illustrates the layout of header and data for a three dword header on a 256‑bit bus with aligned and unaligned data.
The following figure illustrates the layout of header and data for a four dword header on a 256‑bit bus with aligned and unaligned data.
5.2.4.1. Single Packet Per Cycle
In single packer per cycle mode, all received TLPs start at the lower 128-bit boundary on a 256-bit Avalon-ST interface. Turn on Enable Multiple Packets per Cycle on the System Settings tab of the parameter editor to change multiple packets per cycle.
Single packet per cycle mode requires simpler Application Layer packet decode logic on the TX and RX paths because packets always start in the lower 128-bits of the Avalon-ST interface. Although this mode simplifies the Application Layer logic, failure to use the full 256-bit Avalon-ST may slightly reduce the throughput of a design.
The following figure illustrates the layout of header and data for a three dword header on a 256‑bit bus with aligned and unaligned data.
5.2.4.2. Multiple Packets per Cycle on the Avalon-ST TX 256-Bit Interface
If you enable Multiple Packets Per Cycle under the Systems Settings heading, a TLP can start on a 128‑bit boundary. This mode supports multiple start of packet and end of packet signals in a single cycle when the Avalon‑ST interface is 256 bits wide. The following figure illustrates this mode for a 256-bit Avalon‑ST TX interface. In this figure tx_st_eop[0] and tx_st_sop[1] are asserted in the same cycle. Using this mode increases the complexity of the Application Layer logic but results in higher throughput, depending on the TX traffic. Refer to Tradeoffs to Consider when Enabling Multiple Packets per Cycle for an example of the bandwidth when Multiple Packets Per Cycle is enabled and disabled.
5.2.5. Root Port Mode Configuration Requests
If your Application Layer implements ECRC forwarding, it should not apply ECRC forwarding to Configuration Type 0 packets that it issues on the Avalon-ST interface. There should be no ECRC appended to the TLP, and the TD bit in the TLP header should be set to 0. These packets are processed internally by the Hard IP block and are not transmitted on the PCI Express link.
To ensure proper operation when sending Configuration Type 0 transactions in Root Port mode, the application should wait for the Configuration Type 0 transaction to be transferred to the Hard IP for PCI Express Configuration Space before issuing another packet on the Avalon-ST TX port. You can do this by waiting for the core to respond with a completion on the Avalon-ST RX port before issuing the next Configuration Type 0 transaction.
5.3. Clock Signals
Signal |
Direction |
Description |
---|---|---|
refclk |
Input |
Reference clock for the IP core. It must have the frequency specified under the System Settings heading in the parameter editor. This is a dedicated free running input clock to the dedicated REFCLK pin. |
pld_clk |
Input |
Clocks the Application Layer. You can drive this clock with coreclkout_hip. If you drive pld_clk with another clock source, it must be equal to or faster than coreclkout_hip, but cannot be faster than 250 MHz. Choose a clock source with a 0 ppm accuracy if pld_clk is operating at the same frequency as coreclkout_hip. |
coreclkout_hip |
Output |
This is a fixed frequency clock used by the Data Link and Transaction Layers. |
5.4. Reset, Status, and Link Training Signals
Refer to Reset and Clocks for more information about the reset sequence and a block diagram of the reset logic.
Signal |
Direction |
Description |
---|---|---|
npor |
Input |
Active low reset signal. In the Intel hardware example designs, npor is the OR of pin_perst and local_rstn coming from the software Application Layer. If you do not drive a soft reset signal from the Application Layer, this signal must be derived from pin_perst. You cannot disable this signal. Resets the entire IP Core and transceiver. Asynchronous. This signal is edge, not level sensitive; consequently, you cannot use a low value on this signal to hold custom logic in reset. For more information about the reset controller, refer to Reset. |
clr_st | Output | This optional reset signal has the same effect as reset_status. You enable this signal by turning On the Enable Avalon-ST reset output port in the parameter editor. |
reset_status |
Output |
Active high reset status signal. When asserted, this signal indicates that the Hard IP clock is in reset. The reset_status signal is synchronous to the pld_clk clock and is deasserted only when the npor is deasserted and the Hard IP for PCI Express is not in reset (reset_status_hip = 0). You should use reset_status to drive the reset of your application. This reset is used for the Hard IP for PCI Express IP Core with the Avalon-ST interface. |
pin_perst |
Input |
Active low reset from the PCIe reset pin of the device. pin_perst resets the datapath and control registers. Configuration over PCI Express (CvP) requires this signal. For more information about CvP refer to Intel® Arria® 10 CvP Initialization and Partial Reconfiguration over PCI Express User Guide. Intel® Arria® 10 devices can have up to 4 instances of the Hard IP for PCI Express IP core. Each instance has its own pin_perst signal. Intel® Cyclone® 10 GX have a single instance of the Hard IP for PCI Express IP core. You must connect the pin_perst of each Hard IP instance to the corresponding nPERST pin of the device. These pins have the following locations:
For example, if you are using the Hard IP instance in the bottom left corner of the device, you must connect pin_perst to NPERSL0. For maximum use of the Intel® Arria® 10 or Intel® Cyclone® 10 GX device, Intel recommends that you use the bottom left Hard IP first. This is the only location that supports CvP over a PCIe link. If your design does not require CvP, you may select other Hard IP blocks. Refer to the Arria 10 GX, GT, and SX Device Family Pin Connection Guidelines or Intel® Cyclone® 10 GXDevice Family Pin Connection Guidelines for more detailed information about these pins. |
The following figure illustrates the timing relationship between npor and the LTSSM L0 state.
Signal |
Direction |
Description |
---|---|---|
serdes_pll_locked |
Output |
When asserted, indicates that the PLL that generates the coreclkout_hip clock signal is locked. In pipe simulation mode this signal is always asserted. |
pld_core_ready |
Input |
When asserted, indicates that the Application Layer is ready for operation and is providing a stable clock to the pld_clk input. If the coreclkout_hip Hard IP output clock is sourcing the pld_clk Hard IP input, this input can be connected to the serdes_pll_locked output. |
pld_clk_inuse |
Output |
When asserted, indicates that the Hard IP Transaction Layer is using the pld_clk as its clock and is ready for operation with the Application Layer. For reliable operation, hold the Application Layer in reset until pld_clk_inuse is asserted. |
dlup |
Output |
When asserted, indicates that the Hard IP block is in the Data Link Control and Management State Machine (DLCMSM) DL_Up state. |
dlup_exit |
Output |
This signal is asserted low for one pld_clk cycle when the IP core exits the DLCMSM DL_Up state, indicating that the Data Link Layer has lost communication with the other end of the PCIe link and left the Up state. When this pulse is asserted, the Application Layer should generate an internal reset signal that is asserted for at least 32 cycles. |
ev128ns |
Output |
Asserted every 128 ns to create a time base aligned activity. |
ev1us |
Output |
Asserted every 1µs to create a time base aligned activity. |
hotrst_exit |
Output |
Hot reset exit. This signal is asserted for 1 clock cycle when the LTSSM exits the hot reset state. This signal should cause the Application Layer to be reset. This signal is active low. When this pulse is asserted, the Application Layer should generate an internal reset signal that is asserted for at least 32 cycles. |
l2_exit |
Output |
L2 exit. This signal is active low and otherwise remains high. It is asserted for one cycle (changing value from 1 to 0 and back to 1) after the LTSSM transitions from l2.idle to detect. When this pulse is asserted, the Application Layer should generate an internal reset signal that is asserted for at least 32 cycles. |
lane_act[3:0] |
Output |
Lane Active Mode: This signal indicates the number of lanes that configured during link training. The following encodings are defined:
|
currentspeed[1:0] |
Output |
Indicates the current speed of the PCIe link. The following encodings are defined:
|
ltssmstate[4:0] |
Output |
LTSSM state: The LTSSM state machine encoding defines the following states:
|
5.5. ECRC Forwarding
On the Avalon-ST interface, the ECRC field follows the same alignment rules as payload data. For packets with payload, the ECRC is appended to the data as an extra dword of payload. For packets without payload, the ECRC field follows the address alignment as if it were a one dword payload. The position of the ECRC data for data depends on the address alignment. For packets with no payload data, the ECRC position corresponds to the position of Data0.
5.6. Error Signals
The following table describes the ECC error signals. These signals are all valid for one clock cycle. They are synchronous to coreclkout_hip.
ECC for the RX and retry buffers is implemented with MRAM. These error signals are flags. If a specific location of MRAM has errors, as long as that data is in the ECC decoder, the flag indicates the error.
When a correctable ECC error occurs, the Intel® Arria® 10 or Intel® Cyclone® 10 GX Hard IP for PCI Express recovers without any loss of information. No Application Layer intervention is required. In the case of uncorrectable ECC error, Intel recommends that you reset the core.
Signal |
I/O |
Description |
---|---|---|
derr_cor_ext_rcv0 |
Output |
Indicates a corrected error in the RX buffer. This signal is for debug only. It is not valid until the RX buffer is filled with data. This is a pulse, not a level, signal. Internally, the pulse is generated with the 500 MHz clock. A pulse extender extends the signal so that the FPGA fabric running at 250 MHz can capture it. Because the error was corrected by the IP core, no Application Layer intervention is required. (1) |
derr_rpl |
Output |
Indicates an uncorrectable error in the retry buffer. This signal is for debug only. (1) |
derr_cor_ext_rpl0 |
Output |
Indicates a corrected ECC error in the retry buffer. This signal is for debug only. Because the error was corrected by the IP core, no Application Layer intervention is required. (1) |
Notes:
|
5.7. Interrupts for Endpoints
Refer to Interrupts for detailed information about all interrupt mechanisms.
Signal |
Direction |
Description |
---|---|---|
app_msi_req |
Input |
Application Layer MSI request. Assertion causes an MSI posted write TLP to be generated based on the MSI configuration register values and the app_msi_tc and app_msi_num input ports. |
app_msi_ack |
Output |
Application Layer MSI acknowledge. This signal acknowledges the Application Layer's request for an MSI interrupt. |
app_msi_tc[2:0] |
Input |
Application Layer MSI traffic class. This signal indicates the traffic class used to send the MSI (unlike INTX interrupts, any traffic class can be used to send MSIs). |
app_msi_num[4:0] |
Input |
MSI number of the Application Layer. This signal provides the low order message data bits to be sent in the message data field of MSI messages requested by app_msi_req. Only bits that are enabled by the MSI Message Control register apply. |
app_int_sts |
Input |
Controls legacy interrupts. Assertion of app_int_sts causes an Assert_INTA message TLP to be generated and sent upstream. Deassertion of app_int_sts causes a Deassert_INTA message TLP to be generated and sent upstream. |
app_int_ack |
Output |
This signal is the acknowledge for app_int_sts. It is asserted for at least one cycle either when either of the following events occur:
|
5.8. Interrupts for Root Ports
Signal |
Direction |
Description |
---|---|---|
int_status[3:0] |
Output |
These signals drive legacy interrupts to the Application Layer as follows:
|
serr_out |
Output |
System Error: This signal only applies to Root Port designs that report each system error detected, assuming the proper enabling bits are asserted in the Root Control and Device Control registers. If enabled, serr_out is asserted for a single clock cycle when a system error occurs. System errors are described in the PCI Express Base Specification 2.1 or 3.0 in the Root Control register. |
5.9. Completion Side Band Signals
The following table describes the signals that comprise the completion side band signals for the Avalon-ST interface. The Intel® Arria® 10 or Intel® Cyclone® 10 GX Hard IP for PCI Express provides a completion error interface that the Application Layer can use to report errors, such as programming model errors. When the Application Layer detects an error, it can assert the appropriate cpl_err bit to indicate what kind of error to log. If separate requests result in two errors, both are logged. The Hard IP sets the appropriate status bits for the errors in the Configuration Space, and automatically sends error messages in accordance with the PCI Express Base Specification. Note that the Application Layer is responsible for sending the completion with the appropriate completion status value for non-posted requests. Refer to Error Handling for information on errors that are automatically detected and handled by the Hard IP.
For a description of the completion rules, the completion header format, and completion status field values, refer to Section 2.2.9 of the PCI Express Base Specification.
Signal |
Direction |
Description |
---|---|---|
cpl_err[6:0] |
Input |
Completion error. This signal reports completion errors to the Configuration Space. When an error occurs, the appropriate signal is asserted for one cycle.
|
cpl_pending |
Input |
Completion pending. The Application Layer must assert this signal when a master block is waiting for completion, for example, when a Non-Posted Request is pending. The state of this input is reflected by the Transactions Pending bit of the Device Status Register as defined in Section 7.8.5 of the PCI Express Base Specification. |
5.10. Parity Signals
Parity protection provides some data protection in systems that do not use ECRC checking. Parity is odd. This option is not available for the Avalon‑MM Intel® Arria® 10 or Intel® Cyclone® 10 GX Hard IP for PCI Express.
On the RX datapath, parity is computed on the incoming TLP prior to the LCRC check in the Data Link Layer. Up to 32 parity bits are propagated to the Application Layer along with the RX Avalon-ST data. The RX datapath also propagates up to 32 parity bits to the Transaction Layer for Configuration TLPs. On the TX datapath, parity generated in the Application Layer is checked in Transaction Layer and the Data Link Layer.
The following table lists the signals that indicate parity errors. When an error is detected, parity error signals are asserted for one cycle.
Signal Name |
Direction |
Description |
---|---|---|
tx_par_err[1:0] |
Output |
When asserted for a single cycle, indicates a parity error during TX TLP transmission. These errors are logged in the VSEC register. The following encodings are defined:
Note that not all simulation models assert the Transaction Layer error bit in conjunction with the Data Link Layer error bit. |
rx_par_err |
Output |
When asserted for a single cycle, indicates that a parity error was detected in a TLP at the input of the RX buffer. This error is logged as an uncorrectable internal error in the VSEC registers. For more information, refer to Uncorrectable Internal Error Status Register. If this error occurs, you must reset the Hard IP if this error occurs because parity errors can leave the Hard IP in an unknown state. |
cfg_par_err |
Output |
When asserted for a single cycle, indicates that a parity error was detected in a TLP that was routed to internal Configuration Space or to the Configuration Space Shadow Extension Bus. This error is logged as an uncorrectable internal error in the VSEC registers. For more information, refer to Uncorrectable Internal Error Status Register. If this error occurs, you must reset the core because parity errors can put the Hard IP in an unknown state. |
5.11. LMI Signals
LMI interface is used to write log error descriptor information in the TLP header log registers. The LMI access to other registers is intended for debugging, not normal operation.
The LMI interface is synchronized to pld_clk and runs at frequencies up to 250 MHz. The LMI address is the same as the Configuration Space address. The LMI interface provides the same access to Configuration Space registers as Configuration TLP requests. Register bits have the same attributes, (read only, read/write, and so on) for accesses from the LMI interface and from Configuration TLP requests. The 32-bit read and write data is driven, LSB to MSB over 4 consecutive cycles.
When a LMI write has a timing conflict with configuration TLP access, the configuration TLP accesses have higher priority. LMI writes are held and executed when configuration TLP accesses are no longer pending. An acknowledge signal is sent back to the Application Layer when the execution is complete.
All LMI reads are also held and executed when no configuration TLP requests are pending. The LMI interface supports two operations: local read and local write. The timing for these operations complies with the Avalon-MM protocol described in the Avalon Interface Specifications. LMI reads can be issued at any time to obtain the contents of any Configuration Space register. LMI write operations are not recommended for use during normal operation. The Configuration Space registers are written by requests received from the PCI Express link and there may be unintended consequences of conflicting updates from the link and the LMI interface. LMI Write operations are provided for AER header logging, and debugging purposes only.
- In Root Port mode, do not access the Configuration Space using TLPs and the LMI bus simultaneously.
Signal |
Direction |
Description |
---|---|---|
lmi_dout[7:0] |
Output |
Data outputs. Data is driven from LSB, [7:0], to MSB,[31:24]. The LSB coincides withlmi_ack. |
lmi_rden |
Input |
Read enable input. |
lmi_wren |
Input |
Write enable input. |
lmi_ack |
Output |
Write execution done/read data valid. |
lmi_addr[11:0] |
Input |
Address inputs, [1:0] not used. |
lmi_din[7:0] |
Input |
Data inputs. Data is driven from LSB, [7:0], to MSB,[31:24]. The LSB coincides with lim_wren. |
5.12. Transaction Layer Configuration Space Signals
Signal |
Direction |
Description |
---|---|---|
tl_cfg_add[3:0] |
Output |
Address of the register that has been updated. This signal is an index indicating which Configuration Space register information is being driven onto tl_cfg_ctl.The indexing is defined in Multiplexed Configuration Register Information Available on tl_cfg_ctl. The index increments every 8 coreclkout_hip cycles |
tl_cfg_ctl[31:0] |
Output |
The tl_cfg_ctl signal is multiplexed and contains the contents of the Configuration Space registers. The indexing is defined in Multiplexed Configuration Register Information Available on tl_cfg_ctl. |
tl_cfg_sts[52:0] |
Output |
Configuration status bits. This information updates every coreclkout_hip cycle. The following table provides detailed descriptions of the status bits. |
hpg_ctrler[4:0] |
Input |
The hpg_ctrler signals are only available in Root Port mode and when the Slot capability register is enabled. Refer to the Slot register and Slot capability register parameters in Table 6–9 on page 6–10. For Endpoint variations the hpg_ctrler input should be hardwired to 0s. The bits have the following meanings: |
Input |
|
|
Input |
|
|
Input |
|
|
Input |
|
|
Input |
|
tl_cfg_sts |
Configuration Space Register |
Description |
---|---|---|
[52:49] |
Device Status Register[3:0] |
Records the following errors:
|
[48] |
Slot Status Register[8] |
Data Link Layer state changed |
[47] |
Slot Status Register[4] |
Command completed. (The hot plug controller completed a command.) |
[46:31] |
Link Status Register[15:0] |
Records the following link status information:
|
[30] |
Link Status 2 Register[0] |
Current de-emphasis level. |
[29:25] |
Status Register[15:11] |
Records the following 5 primary command status errors:
|
[24] |
Secondary Status Register[8] |
Master data parity error |
[23:6] |
Root Status Register[17:0] |
Records the following PME status information:
|
[5:1] |
Secondary Status Register[15:11] |
Records the following 5 secondary command status errors:
|
[0] |
Secondary Status Register[8] |
Master Data Parity Error |
5.12.1. Configuration Space Register Access Timing
The tl_cfg_add and tl_cfg_ctl signals have multi-cycle paths. They update every eight coreclkout_hip cycles.
To ensure correct values are captured, your Application RTL must include code to force sampling to the middle of this window. The following example RTL captures the correct values of the tl_cfg busses in the case of an eight-cycle window. A generated strobe signal, cfgctl_addr_strobe, captures the address and data values by sampling them in the middle of the window.
// register LSB bit of tl_cfg_add always @(posedge coreclkout_hip) begin tl_cfg_add_reg <= tl_cfg_add[0]; tl_cfg_add_reg2 <= tl_cfg_add_reg; end // detect the address change to generate a strobe to sample the input 32-bit data always @(posedge coreclkout_hip) begin cfgctl_addr_change <= tl_cfg_add_reg2 != tl_cfg_add_reg; cfgctl_addr_change2 <= cfgctl_addr_change; cfgctl_addr_strobe <= cfgctl_addr_change2; end // capture cfg ctl addr/data bus with the strobe always @(posedge coreclkout_hip) if(cfgctl_addr_strobe) begin captured_cfg_addr_reg[3:0] <= tl_cfg_add[3:0]; captured_cfg_data_reg[31:0] <= tl_cfg_ctl[31:0]; end
5.12.2. Configuration Space Register Access
The tl_cfg_ctl signal is a multiplexed bus that contains the contents of Configuration Space registers as shown in the figure below. Information stored in the Configuration Space is accessed in round robin order where tl_cfg_add indicates which register is being accessed. The following table shows the layout of configuration information that is multiplexed on tl_cfg_ctl.
Register |
Width |
Direction |
Description |
---|---|---|---|
cfg_dev_ctrl |
16 |
Output |
cfg_devctrl[15:0] is Device Control for the PCI Express capability structure. |
cfg_dev_ctrl2 |
16 |
Output |
cfg_dev2ctrl[15:0] is Device Control 2 for the PCI Express capability structure. |
cfg_slot_ctrl |
16 |
Output |
cfg_slot_ctrl[15:0] is the Slot Status of the PCI Express capability structure. This register is only available in Root Port mode. |
cfg_link_ctrl |
16 |
Output |
cfg_link_ctrl[15:0]is the primary Link Control of the PCI Express capability structure. For Gen2 or Gen3 operation, you must write a 1’b1 to the Retrain Link bit (Bit[5] of the cfg_link_ctrl) of the Root Port to initiate retraining to a higher data rate after the initial link training to Gen1 L0 state. Retraining directs the Link Training and Status State Machine (LTSSM) to the Recovery state. Retraining to a higher data rate is not automatic for the Intel® Arria® 10 or Intel® Cyclone® 10 GX Hard IP for PCI Express IP Core even if both devices on the link are capable of a higher data rate. |
cfg_link_ctrl2 |
16 |
Output |
cfg_link_ctrl2[31:16] is the secondary Link Control register of the PCI Express capability structure for Gen2 operation. When tl_cfg_addr=4'b0010, tl_cfg_ctl returns the primary and secondary Link Control registers, { {cfg_link_ctrl[15:0], cfg_link_ctrl2[15:0]}. The primary Link Status register contents are available on tl_cfg_sts[46:31]. For Gen1 variants, the link bandwidth notification bit is always set to 0. For Gen2 variants, this bit is set to 1. |
cfg_prm_cmd |
16 |
Output |
Base/Primary Command register for the PCI Configuration Space. |
cfg_root_ctrl |
8 |
Output |
Root control and status register of the PCI Express capability. This register is only available in Root Port mode. |
cfg_sec_ctrl |
16 |
Output |
Secondary bus Control and Status register of the PCI Express capability. This register is available only in Root Port mode. |
cfg_secbus |
8 |
Output |
Secondary bus number. This register is available only in Root Port mode. |
cfg_subbus |
8 |
Output |
Subordinate bus number. This register is available only in Root Port mode. |
cfg_msi_addr |
64 |
Output |
cfg_msi_add[63:32] is the message signaled interrupt (MSI) upper message address. cfg_msi_add[31:0] is the MSI message address. |
cfg_io_bas |
20 |
Output |
The upper 20 bits of the I/O limit registers of the Type1 Configuration Space. This register is only available in Root Port mode. |
cfg_io_lim |
20 |
Output |
The upper 20 bits of the IO limit registers of the Type1 Configuration Space. This register is only available in Root Port mode. |
cfg_np_bas |
12 |
Output |
The upper 12 bits of the memory base register of the Type1 Configuration Space. This register is only available in Root Port mode. |
cfg_np_lim |
12 |
Output |
The upper 12 bits of the memory limit register of the Type1 Configuration Space. This register is only available in Root Port mode. |
cfg_pr_bas |
44 |
Output |
The upper 44 bits of the prefetchable base registers of the Type1 Configuration Space. This register is only available in Root Port mode. |
cfg_pr_lim |
44 |
Output |
The upper 44 bits of the prefetchable limit registers of the Type1 Configuration Space. Available in Root Port mode. |
cfg_pmcsr |
32 |
Output |
cfg_pmcsr[31:16] is Power Management Control and cfg_pmcsr[15:0]is the Power Management Status register. |
cfg_msixcsr |
16 |
Output |
MSI-X message control. |
cfg_msicsr |
16 |
Output |
MSI message control. Refer to the following table for the fields of this register. |
cfg_tcvcmap |
24 |
Output |
Configuration traffic class (TC)/virtual channel (VC) mapping. The Application Layer uses this signal to generate a TLP mapped to the appropriate channel based on the traffic class of the packet.
|
cfg_msi_data |
16 |
Output |
cfg_msi_data[15:0] is message data for MSI. |
cfg_busdev |
13 |
Output |
Bus/Device Number captured by or programmed in the Hard IP. |
Bit(s) |
Field |
Description |
---|---|---|
[15:9] |
Reserved |
N/A |
[8] |
mask capability |
Per-vector masking capable. This bit is hardwired to 0 because the function does not support the optional MSI per-vector masking using the Mask_Bits and Pending_Bits registers defined in the PCI Local Bus Specification. Per-vector masking can be implemented using Application Layer registers. |
[7] |
64-bit address capability |
64-bit address capable.
|
[6:4] |
multiple message enable |
This field indicates permitted values for MSI signals. For example, if “100” is written to this field 16 MSI signals are allocated.
|
[3:1] |
multiple message capable |
This field is read by system software to determine the number of requested MSI messages.
|
[0] |
MSI Enable |
If set to 0, this component is not permitted to use MSI. |
5.13. Hard IP Reconfiguration Interface
The Hard IP reconfiguration interface is an Avalon-MM slave interface with a 10‑bit address and 16‑bit data bus. You can use this bus to dynamically modify the value of configuration registers that are read-only at run time. To ensure proper system operation, reset or repeat device enumeration of the PCI Express link after changing the value of read‑only configuration registers of the Hard IP.
Signal |
Direction |
Description |
---|---|---|
hip_reconfig_clk |
Input |
Reconfiguration clock. The frequency range for this clock is 100–125 MHz. |
hip_reconfig_rst_n |
Input |
Active-low Avalon-MM reset. Resets all of the dynamic reconfiguration registers to their default values as described in Hard IP Reconfiguration Registers. |
hip_reconfig_address[9:0] |
Input |
The 10‑bit reconfiguration address. |
hip_reconfig_read |
Input |
Read signal. This interface is not pipelined. You must wait for the return of the hip_reconfig_readdata[15:0] from the current read before starting another read operation. |
hip_reconfig_readdata[15:0] |
Output |
16‑bit read data. hip_reconfig_readdata[15:0] is valid on the third cycle after the assertion of hip_reconfig_read. |
hip_reconfig_write |
Input |
Write signal. |
hip_reconfig_writedata[15:0] |
Input |
16‑bit write model. |
hip_reconfig_byte_en[1:0] |
Input |
Byte enables, currently unused. |
ser_shift_load |
Input |
You must toggle this signal once after changing to user mode before the first access to read‑only registers. This signal should remain asserted for a minimum of 324 ns after switching to user mode. |
interface_sel |
Input |
A selector which must be asserted when performing dynamic reconfiguration. Drive this signal low 4 clock cycles after the release of ser_shif t_load. |
For a detailed description of the Avalon-MM protocol, refer to the Avalon Memory Mapped Interfaces chapter in the Avalon Interface Specifications.
5.14. Power Management Signals
Signal |
Direction |
Description |
---|---|---|
pme_to_cr |
Input |
Power management turn off control register. Root Port—When this signal is asserted, the Root Port sends the PME_turn_off message. Endpoint—This signal is asserted to acknowledge the PME_turn_off message by sending pme_to_ack to the Root Port. |
pme_to_sr |
Output |
Power management turn off status register. Root Port—This signal is asserted for 1 clock cycle when the Root Port receives the pme_turn_off acknowledge message. Endpoint—This signal is asserted for 1 cycle when the Endpoint receives the PME_turn_off message from the Root Port. |
pm_event |
Input |
Power Management Event. This signal is only available for Endpoints. The Endpoint initiates a a power_management_event message (PM_PME) that is sent to the Root Port. If the Hard IP is in a low power state, the link exits from the low-power state to send the message. This signal is positive edge-sensitive. |
pm_data[9:0] |
Input |
Power Management Data. This bus indicates power consumption of the component. This bus can only be implemented if all three bits of AUX_power (part of the Power Management Capabilities structure) are set to 0. This bus includes the following bits:
For example, the two registers might have the following values:
To find the maximum power consumed by this component, multiply the data value by the data Scale (114 × .01 = 1.14). 1.14 watts is the maximum power allocated to this component in the power state selected by the data_select field. |
pm_auxpwr |
Input |
Power Management Auxiliary Power: This signal can be tied to 0 because the L2 power state is not supported. |
Bits |
Field |
Description |
---|---|---|
[31:24] |
Data register |
This field indicates in which power states a function can assert the PME# message. |
[23:16] |
reserved |
— |
[15] |
PME_status |
When set to 1, indicates that the function would normally assert the PME# message independently of the state of the PME_en bit. |
[14:13] |
data_scale |
This field indicates the scaling factor when interpreting the value retrieved from the data register. This field is read-only. |
[12:9] |
data_select |
This field indicates which data should be reported through the data register and the data_scale field. |
[8] |
PME_EN |
1: indicates that the function can assert PME#0: indicates that the function cannot assert PME# |
[7:2] |
reserved |
— |
[1:0] |
PM_state |
Specifies the power management state of the operating condition being described. The following encodings are defined:
A device returns 2b’11 in this field and Aux or PME Aux in the type register to specify the D3-Cold PM state. An encoding of 2b’11 along with any other type register value specifies the D3-Hot state. |
5.15. Physical Layer Interface Signals
Intel provides an integrated solution with the Transaction, Data Link and Physical Layers. The IP Parameter Editor generates a SERDES variation file, <variation>_serdes.v or .vhd , in addition to the Hard IP variation file, <variation>.v or .vhd. The SERDES entity is included in the library files for PCI Express.
5.15.1. Serial Data Signals
The Intel® Cyclone® 10 GX PCIe IP Core supports 1, 2, or 4 lanes. Each lane includes a TX and RX differential pair. Data is striped across all available lanes.
The Intel® Arria® 10 PCIe IP Core supports 1, 2, 4 or 8 lanes. Each lane includes a TX and RX differential pair. Data is striped across all available lanes.
Signal |
Direction |
Description |
---|---|---|
tx_out[<n>-1:0] |
Output |
Transmit output. These signals are the serial outputs of lanes <n>-1–0. |
rx_in[<n>-1:0] |
Input |
Receive input. These signals are the serial inputs of lanes <n>-1–0. |
Refer to Pin-out Files for Intel Devices for pin-out tables for all Intel devices in .pdf, .txt, and .xls formats.
Transceiver channels are arranged in groups of six. For GX devices, the lowest six channels on the left side of the device are labeled GXB_L0, the next group is GXB_L1, and so on. Channels on the right side of the device are labeled GXB_R0, GXB_R1, and so on. Be sure to connect the Hard IP for PCI Express on the left side of the device to appropriate channels on the left side of the device, as specified in the Pin-out Files for Intel Devices.
5.15.2. PIPE Interface Signals
These PIPE signals are available for Gen1, Gen2, and Gen3 variants so that you can simulate using either the serial or the PIPE interface. Simulation is much faster using the PIPE interface because the PIPE simulation bypasses the SERDES model . By default, the PIPE interface data width is 8 bits for Gen1 and Gen2 and 32 bits for Gen3. You can use the PIPE interface for simulation even though your actual design includes a serial interface to the internal transceivers. However, it is not possible to use the Hard IP PIPE interface in hardware, including probing these signals using Signal Tap.
Intel® Cyclone® 10 GX devices do not support the Gen3 data rate.
In the following table, signals that include lane number 0 also exist for lanes 1-4. For Gen1 and Gen2 operation outputs can be left floating.
Signal |
Direction |
Description |
---|---|---|
txdata0[31:0] |
Output |
Transmit data <n>. This bus transmits data on lane <n>. |
txdatak0[3:0] |
Output |
Transmit data control <n>. This signal serves as the control bit for txdata <n>. Bit 0 corresponds to the lowest-order byte of txdata, and so on. A value of 0 indicates a data byte. A value of 1 indicates a control byte. For Gen1 and Gen2 only. |
txblkst0 |
Output |
For Gen3 operation, indicates the start of a block in the transmit direction. |
txcompl0 |
Output |
Transmit compliance <n>. This signal forces the running disparity to negative in Compliance Mode (negative COM character). |
txdataskip0 |
Output |
For Gen3 operation. Allows the MAC to instruct the TX interface to ignore the TX data interface for one clock cycle. The following encodings are defined:
|
txdeemph0 |
Output |
Transmit de-emphasis selection. The Intel® Arria® 10 Hard IP for PCI Express sets the value for this signal based on the indication received from the other end of the link during the Training Sequences (TS). You do not need to change this value. |
txdetectrx0 |
Output |
Transmit detect receive <n>. This signal tells the PHY layer to start a receive detection operation or to begin loopback. |
txelecidle0 |
Output |
Transmit electrical idle <n>. This signal forces the TX output to electrical idle. |
txswing |
Output |
When asserted, indicates full swing for the transmitter voltage. When deasserted indicates half swing. |
txmargin[2:0] |
Output |
Transmit VOD margin selection. The value for this signal is based on the value from the Link Control 2 Register. Available for simulation only. |
txsynchd0[1:0] |
Output |
For Gen3 operation, specifies the transmit block type. The following encodings are defined:
|
rxdata0[31:0] |
Input |
Receive data <n>. This bus receives data on lane <n>. |
rxdatak[3:0] |
Input |
Receive data >n>. This bus receives data on lane <n>. Bit 0 corresponds to the lowest-order byte of rxdata, and so on. A value of 0 indicates a data byte. A value of 1 indicates a control byte. For Gen1 and Gen2 only. |
rxblkst0 |
Input |
For Gen3 operation, indicates the start of a block in the receive direction. |
rxdataskip0 |
Output |
For Gen3 operation. Allows the PCS to instruct the RX interface to ignore the RX data interface for one clock cycle. The following encodings are defined:
|
rxelecidle0 |
Input |
Receive electrical idle <n>. When asserted, indicates detection of an electrical idle. |
rxpolarity0 |
Output |
Receive polarity <n>. This signal instructs the PHY layer to invert the polarity of the 8B/10B receiver decoding block. |
rxstatus0[2:0] |
Input |
Receive status <n>. This signal encodes receive status, including error codes for the receive data stream and receiver detection. |
rxsynchd0[1:0] |
Input |
For Gen3 operation, specifies the receive block type. The following encodings are defined:
|
rxvalid0 |
Input |
Receive valid <n>. This signal indicates symbol lock and valid data on rxdata <n> and rxdatak <n>. |
phystatus0 |
Input |
PHY status <n>. This signal communicates completion of several PHY requests. |
powerdown0[1:0] |
Output |
Power down <n>. This signal requests the PHY to change its power state to the specified state (P0, P0s, P1, or P2). |
currentcoeff0[17:0] |
Output |
For Gen3, specifies the coefficients to be used by the transmitter. The 18 bits specify the following coefficients:
|
currentrxpreset0[2:0] |
Output |
For Gen3 designs, specifies the current preset. |
simu_mode_pipe |
Input |
When set to 1, the PIPE interface is in simulation mode. |
sim_pipe_rate[1:0] |
Output |
The 2‑bit encodings have the following meanings:
|
rate[1:0] |
Output |
The 2‑bit encodings have the following meanings:
|
sim_pipe_pclk_in |
Input |
This clock is used for PIPE simulation only, and is derived from the refclk. It is the PIPE interface clock used for PIPE mode simulation. |
sim_pipe_ltssmstate0[4:0] |
Input and Output |
LTSSM state: The LTSSM state machine encoding defines the following states:
|
rxfreqlocked0 |
Input |
When asserted indicates that the pclk_in used for PIPE simulation is valid. |
eidleinfersel0[2:0] |
Output |
Electrical idle entry inference mechanism selection. The following encodings are defined:
|
5.15.3. Test Signals
Signal |
Direction |
Description |
---|---|---|
test_in[31:0] |
Input |
The bits of the test_in bus have the following definitions. Set this bus to 0x00000188.
|
testin_zero |
Output |
When asserted, indicates accelerated initialization for simulation is active. |
lane_act[3:0] |
Output |
Lane Active Mode: This signal indicates the number of
lanes that configured during link training. The following encodings are
defined:
|
5.15.4. Intel Arria 10 Development Kit Conduit Interface
Signal Name | Direction | Description |
---|---|---|
devkit_status[255:0] | Output | The devkit_status[255:0] bus comprises
the following status signals :
|
devkit_ctrl[255:0] | Input | The devkit_ctrl[255:0] bus comprises the following
status signals. You can optionally connect these pins to an
on-board switch for PCI-SIG compliance testing, such as bypass
compliance testing.
|
6. Registers
6.1. Correspondence between Configuration Space Registers and the PCIe Specification
Byte Address |
Hard IP Configuration Space Register |
Corresponding Section in PCIe Specification |
---|---|---|
0x000:0x03C |
PCI Header Type 0 Configuration Registers |
Type 0 Configuration Space Header |
0x000:0x03C |
PCI Header Type 1 Configuration Registers |
Type 1 Configuration Space Header |
0x040:0x04C |
Reserved |
N/A |
0x050:0x05C |
MSI Capability Structure |
MSI Capability Structure |
0x068:0x070 |
MSI-X Capability Structure |
MSI-X Capability Structure |
0x070:0x074 |
Reserved |
N/A |
0x078:0x07C |
Power Management Capability Structure |
PCI Power Management Capability Structure |
0x080:0x0BC |
PCI Express Capability Structure |
PCI Express Capability Structure |
0x0C0:0x0FC |
Reserved |
N/A |
0x100:0x16C |
Virtual Channel Capability Structure |
Virtual Channel Capability |
0x170:0x17C |
Reserved |
N/A |
0x180:0x1FC |
Virtual channel arbitration table |
VC Arbitration Table |
0x200:0x23C |
Port VC0 arbitration table |
Port Arbitration Table |
0x240:0x27C |
Port VC1 arbitration table |
Port Arbitration Table |
0x280:0x2BC |
Port VC2 arbitration table |
Port Arbitration Table |
0x2C0:0x2FC |
Port VC3 arbitration table |
Port Arbitration Table |
0x300:0x33C |
Port VC4 arbitration table |
Port Arbitration Table |
0x340:0x37C |
Port VC5 arbitration table |
Port Arbitration Table |
0x380:0x3BC |
Port VC6 arbitration table |
Port Arbitration Table |
0x3C0:0x3FC |
Port VC7 arbitration table |
Port Arbitration Table |
0x400:0x7FC |
Reserved |
PCIe spec corresponding section name |
0x800:0x834 |
Advanced Error Reporting AER (optional) |
Advanced Error Reporting Capability |
0x838:0xFFF |
Reserved |
N/A |
Overview of Configuration Space Register Fields | ||
0x000 |
Device ID, Vendor ID |
Type 0 Configuration Space Header Type 1 Configuration Space Header |
0x004 |
Status, Command |
Type 0 Configuration Space Header Type 1 Configuration Space Header |
0x008 |
Class Code, Revision ID |
Type 0 Configuration Space Header Type 1 Configuration Space Header |
0x00C |
BIST, Header Type, Primary Latency Timer, Cache Line Size |
Type 0 Configuration Space Header Type 1 Configuration Space Header |
0x010 |
Base Address 0 |
Base Address Registers |
0x014 |
Base Address 1 |
Base Address Registers |
0x018 |
Base Address 2 Secondary Latency Timer, Subordinate Bus Number, Secondary Bus Number, Primary Bus Number |
Base Address Registers Secondary Latency Timer, Type 1 Configuration Space Header, Primary Bus Number |
0x01C |
Base Address 3 Secondary Status, I/O Limit, I/O Base |
Base Address Registers Secondary Status Register ,Type 1 Configuration Space Header |
0x020 |
Base Address 4 Memory Limit, Memory Base |
Base Address Registers Type 1 Configuration Space Header |
0x024 |
Base Address 5 Prefetchable Memory Limit, Prefetchable Memory Base |
Base Address Registers Prefetchable Memory Limit, Prefetchable Memory Base |
0x028 |
Reserved Prefetchable Base Upper 32 Bits |
N/A Type 1 Configuration Space Header |
0x02C |
Subsystem ID, Subsystem Vendor ID Prefetchable Limit Upper 32 Bits |
Type 0 Configuration Space Header Type 1 Configuration Space Header |
0x030 |
Expansion ROM base address I/O Limit Upper 16 Bits, I/O Base Upper 16 Bits |
Type 0 Configuration Space Header Type 1 Configuration Space Header |
0x034 |
Reserved, Capabilities PTR |
Type 0 Configuration Space Header Type 1 Configuration Space Header |
0x038 |
Reserved Expansion ROM Base Address |
N/A Type 1 Configuration Space Header |
0x03C |
Interrupt Pin, Interrupt Line Bridge Control, Interrupt Pin, Interrupt Line |
Type 0 Configuration Space Header Type 1 Configuration Space Header |
0x050 |
MSI-Message Control Next Cap Ptr Capability ID |
MSI and MSI-X Capability Structures |
0x054 |
Message Address |
MSI and MSI-X Capability Structures |
0x058 |
Message Upper Address |
MSI and MSI-X Capability Structures |
0x05C |
Reserved Message Data |
MSI and MSI-X Capability Structures |
0x068 |
MSI-X Message Control Next Cap Ptr Capability ID |
MSI and MSI-X Capability Structures |
0x06C |
MSI-X Table Offset BIR |
MSI and MSI-X Capability Structures |
0x070 |
Pending Bit Array (PBA) Offset BIR |
MSI and MSI-X Capability Structures |
0x078 |
Capabilities Register Next Cap PTR Cap ID |
PCI Power Management Capability Structure |
0x07C |
Data PM Control/Status Bridge Extensions Power Management Status & Control |
PCI Power Management Capability Structure |
0x080 |
PCI Express Capabilities Register Next Cap Ptr PCI Express Cap ID |
PCI Express Capability Structure |
0x084 |
Device Capabilities Register |
PCI Express Capability Structure |
0x088 |
Device Status Register Device Control Register |
PCI Express Capability Structure |
0x08C |
Link Capabilities Register |
PCI Express Capability Structure |
0x090 |
Link Status Register Link Control Register |
PCI Express Capability Structure |
0x094 |
Slot Capabilities Register |
PCI Express Capability Structure |
0x098 |
Slot Status Register Slot Control Register |
PCI Express Capability Structure |
0x09C |
Root Capabilities Register Root Control Register |
PCI Express Capability Structure |
0x0A0 |
Root Status Register |
PCI Express Capability Structure |
0x0A4 |
Device Capabilities 2 Register |
PCI Express Capability Structure |
0x0A8 |
Device Status 2 Register Device Control 2 Register |
PCI Express Capability Structure |
0x0AC |
Link Capabilities 2 Register |
PCI Express Capability Structure |
0x0B0 |
Link Status 2 Register Link Control 2 Register |
PCI Express Capability Structure |
0x0B4:0x0BC |
Reserved |
PCI Express Capability Structure |
0x800 |
Advanced Error Reporting Enhanced Capability Header |
Advanced Error Reporting Enhanced Capability Header |
0x804 |
Uncorrectable Error Status Register |
Uncorrectable Error Status Register |
0x808 |
Uncorrectable Error Mask Register |
Uncorrectable Error Mask Register |
0x80C |
Uncorrectable Error Severity Register |
Uncorrectable Error Severity Register |
0x810 |
Correctable Error Status Register |
Correctable Error Status Register |
0x814 |
Correctable Error Mask Register |
Correctable Error Mask Register |
0x818 |
Advanced Error Capabilities and Control Register |
Advanced Error Capabilities and Control Register |
0x81C |
Header Log Register |
Header Log Register |
0x82C |
Root Error Command |
Root Error Command Register |
0x830 |
Root Error Status |
Root Error Status Register |
0x834 |
Error Source Identification Register Correctable Error Source ID Register |
Error Source Identification Register |
6.2. Type 0 Configuration Space Registers
6.3. Type 1 Configuration Space Registers
6.4. PCI Express Capability Structures
6.5. Intel-Defined VSEC Registers
Bits |
Register Description |
Value |
Access |
---|---|---|---|
[15:0] |
PCI Express Extended Capability ID. Intel-defined value for VSEC Capability ID. |
0x000B |
RO |
[19:16] |
Version. Intel-defined value for VSEC version. |
0x1 |
RO |
[31:20] |
Next Capability Offset. Starting address of the next Capability Structure implemented, if any. |
Variable |
RO |
Bits |
Register Description |
Value |
Access |
---|---|---|---|
[15:0] |
VSEC ID. A user configurable VSEC ID. |
User entered |
RO |
[19:16] |
VSEC Revision. A user configurable VSEC revision. |
Variable |
RO |
[31:20] |
VSEC Length. Total length of this structure in bytes. |
0x044 |
RO |
Bits |
Register Description |
Value |
Access |
---|---|---|---|
[31:0] |
Intel Marker. This read only register is an additional marker. If you use the standard Intel Programmer software to configure the device with CvP, this marker provides a value that the programming software reads to ensure that it is operating with the correct VSEC. |
A Device Value |
RO |
Bits |
Register Description |
Value |
Access |
---|---|---|---|
[127:96] |
JTAG Silicon ID DW3 |
Application Specific |
RO |
[95:64] |
JTAG Silicon ID DW2 |
Application Specific |
RO |
[63:32] |
JTAG Silicon ID DW1 |
Application Specific |
RO |
[31:0] |
JTAG Silicon ID DW0. This is the JTAG Silicon ID that CvP programming software reads to determine that the correct SRAM object file (.sof) is being used. |
Application Specific |
RO |
Bits |
Register Description |
Value |
Access |
---|---|---|---|
[15:0] |
Configurable device or board type ID to specify to CvP the correct .sof. |
Variable |
RO |
6.5.1. CvP Registers
Bits | Register Description | Reset Value | Access |
---|---|---|---|
[31:26] | Reserved | 0x00 | RO |
[25] | PLD_CORE_READY. From FPGA fabric. This status bit is provided for debug. | Variable | RO |
[24] | PLD_CLK_IN_USE. From clock switch module to fabric. This status bit is provided for debug. | Variable | RO |
[23] | CVP_CONFIG_DONE. Indicates that the FPGA control block has completed the device configuration via CvP and there were no errors. | Variable | RO |
[22] | Reserved | Variable | RO |
[21] | USERMODE. Indicates if the configurable FPGA fabric is in user mode. | Variable | RO |
[20] | CVP_EN. Indicates if the FPGA control block has enabled CvP mode. | Variable | RO |
[19] | CVP_CONFIG_ERROR. Reflects the value of this signal from the FPGA control block, checked by software to determine if there was an error during configuration. | Variable | RO |
[18] | CVP_CONFIG_READY. Reflects the value of this signal from the FPGA control block, checked by software during programming algorithm. | Variable | RO |
[17:0] | Reserved | Variable | RO |
Bits |
Register Description |
Reset Value |
Access |
---|---|---|---|
[31:16] |
Reserved. |
0x0000 |
RO |
[15:8] |
CVP_NUMCLKS. This is the number of clocks to send for every CvP data write. Set this field to one of the values below depending on your configuration image:
|
0x00 |
RW |
[7:3] |
Reserved. |
0x0 |
RO |
[2] |
CVP_FULLCONFIG. Request that the FPGA control block reconfigure the entire FPGA including the Intel® Arria® 10 or Intel® Cyclone® 10 GX Hard IP for PCI Express, bring the PCIe link down. |
1’b0 |
RW |
[1] |
HIP_CLK_SEL. Selects between PMA and fabric clock when USER_MODE = 1 and PLD_CORE_READY = 1. The following encodings are defined:
To ensure that there is no clock switching during CvP, you should only change this value when the Hard IP for PCI Express has been idle for 10 µs and wait 10 µs after changing this value before resuming activity. |
1’b0 |
RW |
[0] |
CVP_MODE. Controls whether the IP core is in CVP_MODE or normal mode. The following encodings are defined:
|
1’b0 |
RW |
Bits |
Register Description |
Reset Value |
Access |
---|---|---|---|
[31:0] |
Upper 32 bits of configuration data to be transferred to the FPGA control block to configure the device. You can choose 32- or 64-bit data. |
0x00000000 |
RW |
[31:0] |
Lower 32 bits of configuration data to be transferred to the FPGA control block to configure the device. |
0x00000000 |
RW |
Bits |
Register Description |
Reset Value |
Access |
---|---|---|---|
[31:2] |
Reserved. |
0x0000 |
RO |
[1] |
START_XFER. Sets the CvP output to the FPGA control block indicating the start of a transfer. |
1’b0 |
RW |
[0] |
CVP_CONFIG. When asserted, instructs that the FPGA control block begin a transfer via CvP. |
1’b0 |
RW |
6.6. Advanced Error Reporting Capability
6.6.1. Uncorrectable Internal Error Mask Register
Bits |
Register Description |
Reset Value |
Access |
---|---|---|---|
[31:12] |
Reserved. |
1b’0 |
RO |
[11] |
Mask for RX buffer posted and completion overflow error. |
1b’0 |
RWS |
[10] |
Reserved |
1b’1 |
RO |
[9] |
Mask for parity error detected on Configuration Space to TX bus interface. |
1b’1 |
RWS |
[8] |
Mask for parity error detected on the TX to Configuration Space bus interface. |
1b’1 |
RWS |
[7] |
Mask for parity error detected at TX Transaction Layer error. |
1b’1 |
RWS |
[6] |
Reserved |
1b’1 |
RO |
[5] |
Mask for configuration errors detected in CvP mode. |
1b’0 |
RWS |
[4] |
Mask for data parity errors detected during TX Data Link LCRC generation. |
1b’1 |
RWS |
[3] |
Mask for data parity errors detected on the RX to Configuration Space Bus interface. |
1b’1 |
RWS |
[2] |
Mask for data parity error detected at the input to the RX Buffer. |
1b’1 |
RWS |
[1] |
Mask for the retry buffer uncorrectable ECC error. |
1b’1 |
RWS |
[0] |
Mask for the RX buffer uncorrectable ECC error. |
1b’1 |
RWS |
6.6.2. Uncorrectable Internal Error Status Register
Bits |
Register Description |
Reset Value |
Access |
---|---|---|---|
[31:12] |
Reserved. |
0 |
RO |
[11] |
When set, indicates an RX buffer overflow condition in a posted request or Completion |
0 |
RW1CS |
[10] |
Reserved. |
0 |
RO |
[9] |
When set, indicates a parity error was detected on the Configuration Space to TX bus interface |
0 |
RW1CS |
[8] |
When set, indicates a parity error was detected on the TX to Configuration Space bus interface |
0 |
RW1CS |
[7] |
When set, indicates a parity error was detected in a TX TLP and the TLP is not sent. |
0 |
RW1CS |
[6] |
When set, indicates that the Application Layer has detected an uncorrectable internal error. |
0 |
RW1CS |
[5] |
When set, indicates a configuration error has been detected in CvP mode which is reported as uncorrectable. This bit is set whenever a CVP_CONFIG_ERROR rises while in CVP_MODE. |
0 |
RW1CS |
[4] |
When set, indicates a parity error was detected by the TX Data Link Layer. |
0 |
RW1CS |
[3] |
When set, indicates a parity error has been detected on the RX to Configuration Space bus interface. |
0 |
RW1CS |
[2] |
When set, indicates a parity error was detected at input to the RX Buffer. |
0 |
RW1CS |
[1] |
When set, indicates a retry buffer uncorrectable ECC error. |
0 |
RW1CS |
[0] |
When set, indicates a RX buffer uncorrectable ECC error. |
0 |
RW1CS |
6.6.3. Correctable Internal Error Mask Register
Bits |
Register Description |
Reset Value |
Access |
---|---|---|---|
[31:8] |
Reserved. |
0 |
RO |
[7] | Reserved. | 1 | RO |
[6] |
Mask for Corrected Internal Error reported by the Application Layer. |
1 |
RWS |
[5] |
Mask for configuration error detected in CvP mode. |
1 |
RWS |
[4:2] |
Reserved. |
0 |
RO |
[1] |
Mask for retry buffer correctable ECC error. |
1 |
RWS |
[0] |
Mask for RX Buffer correctable ECC error. |
1 |
RWS |
6.6.4. Correctable Internal Error Status Register
Bits |
Register Description |
Reset Value |
Access |
---|---|---|---|
[31:7] |
Reserved. |
0 |
RO |
[6] | Corrected Internal Error reported by the Application Layer. | 0 | RW1CS |
[5] |
When set, indicates a configuration error has been detected in CvP mode which is reported as correctable. This bit is set whenever a CVP_CONFIG_ERROR occurs while in CVP_MODE. |
0 |
RW1CS |
[4:2] |
Reserved. |
0 |
RO |
[1] |
When set, the retry buffer correctable ECC error status indicates an error. |
0 |
RW1CS |
[0] |
When set, the RX buffer correctable ECC error status indicates an error. |
0 |
RW1CS |
7. Reset and Clocks
The following figure shows the hard reset controller that is embedded inside the Hard IP for PCI Express* . This controller takes in the npor and pin_perst inputs and generates the internal reset signals for other modules in the Hard IP.
7.1. Reset Sequence for Hard IP for PCI Express IP Core and Application Layer
Use the reset_status output of the Hard IP to drive the reset of your Application Layer logic.
After pin_perst or npor is released, the Hard IP reset controller deasserts reset_status. Your Application Layer logic can then come out of reset and become operational.
The RX transceiver reset sequence includes the following steps:
- After rx_pll_locked is asserted, the LTSSM state machine transitions from the Detect.Quiet to the Detect.Active state.
- When the pipe_phystatus pulse is asserted and pipe_rxstatus[2:0] = 3, the receiver detect operation has completed.
- The LTSSM state machine transitions from the Detect.Active state to the Polling.Active state.
- The Hard IP for PCI Express asserts rx_digitalreset. The rx_digitalreset signal is deasserted after rx_signaldetect is stable for a minimum of 3 ms.
The TX transceiver reset sequence includes the following steps:
- After npor is deasserted, the IP core deasserts the npor_serdes input to the TX transceiver.
- The SERDES reset controller waits for pll_locked to be stable for a minimum of 127 pld_clk cycles before deasserting tx_digitalreset.
For descriptions of the available reset signals, refer to Reset Signals, Status, and Link Training Signals.
7.2. Clocks
The Hard IP contains a clock domain crossing (CDC) synchronizer at the interface between the PHY/MAC and the DLL layers. The synchronizer allows the Data Link and Transaction Layers to run at frequencies independent of the PHY/MAC. The CDC synchronizer provides more flexibility for the user clock interface. Depending on parameters you specify, the core selects the appropriate coreclkout_hip. You can use these parameters to enhance performance by running at a higher frequency for latency optimization or at a lower frequency to save power.
In accordance with the PCI Express Base Specification, you must provide a 100 MHz reference clock that is connected directly to the transceiver.
7.2.1. Clock Domains
As this figure indicates, the IP core includes the following clock domains: pclk, coreclkout_hip and pld_clk.
7.2.1.1. coreclkout_hip
Link Width |
Maximum Link Rate |
Avalon Interface Width |
coreclkout_hip |
---|---|---|---|
×1 |
Gen1 |
64 | 62.5 MHz5 |
×1 |
Gen1 |
64 |
125 MHz |
×2 |
Gen1 |
64 |
125 MHz |
×4 |
Gen1 |
64 |
125 MHz |
×2 |
Gen2 | 64 |
125 MHz |
×4 |
Gen2 |
128 |
125 MHz |
×1 |
Gen1 |
64 | 62.5 MHz6 |
×1 |
Gen1 |
64 |
125 MHz |
×2 |
Gen1 |
64 |
125 MHz |
×4 |
Gen1 |
64 |
125 MHz |
×8 |
Gen1 |
64 |
250 MHz |
×8 |
Gen1 |
128 |
125 MHz |
×1 |
Gen2 | 64 |
125 MHz |
×2 |
Gen2 | 64 |
125 MHz |
×4 |
Gen2 |
64 |
250 MHz |
×4 |
Gen2 |
128 |
125 MHz |
×8 |
Gen2 |
128 |
250 MHz |
×8 |
Gen2 |
256 |
125 MHz |
×1 |
Gen3 | 64 |
125 MHz |
×2 |
Gen3 | 64 |
125 MHz |
×2 |
Gen3 | 128 | 125 MHz |
×2 |
Gen3 | 64 |
250 MHz |
×4 |
Gen3 |
128 |
250 MHz |
×4 |
Gen3 |
256 |
125 MHz |
×8 |
Gen3 |
256 |
250 MHz |
7.2.1.2. pld_clk
coreclkout_hip can drive the Application Layer clock along with the pld_clk input to the IP core. The pld_clk can optionally be sourced by a different clock than coreclkout_hip. The pld_clk minimum frequency cannot be lower than the coreclkout_hip frequency. Based on specific Application Layer constraints, a PLL can be used to derive the desired frequency.
7.2.2. Clock Summary
Name |
Frequency |
Clock Domain |
---|---|---|
coreclkout_hip |
62.5, 125 or 250 MHz |
Avalon‑ST interface between the Transaction and Application Layers. |
pld_clk |
pld_clk has a maximum frequency of 250 MHz and a minimum frequency that can be equal or more than the coreclkout_hip frequency, depending on the link width, link rate, and Avalon® interface width as indicated in the table for the Application Layer clock frequency above. |
Application and Transaction Layers. |
refclk |
100 MHz |
SERDES (transceiver). Dedicated free running input clock to the SERDES block. |
hip_reconfig_clk |
Avalon‑MM interface for Hard IP dynamic reconfiguration interface which you can use to change the value of read‑only configuration registers at run‑time. This interface is optional. It is not required for Intel® Arria® 10 or Intel® Cyclone® 10 GX devices. |
8. Interrupts
8.1. Interrupts for Endpoints
The IP provides support for PCI Express MSI, MSI-X, and legacy interrupts when configured in Endpoint mode. The MSI and legacy interrupts are mutually exclusive. After power up, the Hard IP block starts in legacy interrupt mode. Then, software decides whether to switch to MSI or MSI-X mode. To switch to MSI mode, software programs the msi_enable bit of the MSI Message Control Register to 1, (bit[16] of 0x050). You enable MSI-X mode, by turning on Implement MSI-X under the PCI Express/PCI Capabilities tab using the parameter editor. If you turn on the Implement MSI-X option, you should implement the MSI-X table structures at the memory space pointed to by the BARs.
Refer to section 6.1 of PCI Express Base Specification for a general description of PCI Express interrupt support for Endpoints.
8.1.1. MSI and Legacy Interrupts
- The MSI Capability registers
- The traffic class (app_msi_tc)
- The message data specified by app_msi_num
The Application Layer Interrupt Handler Module also generates legacy interrupts. The app_int_sts signal controls legacy interrupt assertion and deassertion.
The following figure illustrates a possible implementation of the Interrupt Handler Module with a per vector enable bit. Alternatively, the Application Layer could implement a global interrupt enable instead of this per vector MSI.
There are 32 possible MSI messages. The number of messages requested by a particular component does not necessarily correspond to the number of messages allocated. For example, in the following figure, the Endpoint requests eight MSIs but is only allocated two. In this case, you must design the Application Layer to use only two allocated messages.
The following table describes three example implementations. The first example allocates all 32 MSI messages. The second and third examples only allocate 4 interrupts.
MSI |
Allocated |
||
---|---|---|---|
32 |
4 |
4 |
|
System Error |
31 |
3 |
3 |
Hot Plug and Power Management Event |
30 |
2 |
3 |
Application Layer |
29:0 |
1:0 |
2:0 |
MSI interrupts generated for Hot Plug, Power Management Events, and System Errors always use Traffic Class 0. MSI interrupts generated by the Application Layer can use any Traffic Class. For example, a DMA that generates an MSI at the end of a transmission can use the same traffic control as was used to transfer data.
The following figure illustrates the interactions among MSI interrupt signals for the Root Port. The minimum latency possible between app_msi_req and app_msi_ack is one clock cycle. In this timing diagram app_msi_req can extend beyond app_msi_ack before deasserting. In other words, the earliest that app_msi_req can deassert is on the rising edge of clock cycle 5 (one cycle after app_msi_ack is asserted) as shown, but it can deassert in later clock cycles as well.
8.1.2. MSI-X
You can enable MSI-X interrupts by turning on Implement MSI-X under the PCI Express/PCI Capabilities heading using the parameter editor. If you turn on the Implement MSI-X option, you should implement the MSI-X table structures at the memory space pointed to by the BARs as part of your Application Layer.
The Application Layer transmits MSI-X interrupts on the Avalon®-ST TX interface. MSI-X interrupts are single dword Memory Write TLPs. Consequently, the Last DW Byte Enable in the TLP header must be set to 4b’0000. MSI-X TLPs should be sent only when enabled by the MSI-X enable and the function mask bits in the Message Control for the MSI-X Configuration register. These bits are available on the tl_cfg_ctl output bus.
8.1.3. Implementing MSI-X Interrupts
-
Host software sets up the MSI-X interrupts in the Application
Layer by completing the following steps:
-
Host software reads the Message
Control register at 0x050 register to determine the MSI-X
Table size. The number of table entries is the <value read> + 1.
The maximum table size is 2048 entries. Each 16-byte entry is divided in 4 fields as shown in the figure below. For multi-function variants, BAR4 accesses the MSI-X table. For all other variants, any BAR can access the MSI-X table. The base address of the MSI-X table must be aligned to a 4 KB boundary.
-
The host sets up the MSI-X table. It programs MSI-X
address, data, and masks bits for each entry as shown in the figure
below.
Figure 89. Format of MSI-X Table
-
The host calculates the address of the <n
th
> entry using the following formula:
nth_address = base address[BAR] + 16<n>
-
Host software reads the Message
Control register at 0x050 register to determine the MSI-X
Table size. The number of table entries is the <value read> + 1.
- When Application Layer has an interrupt, it drives an interrupt request to the IRQ Source module.
-
The IRQ Source sets appropriate bit in the MSI-X PBA table.
The PBA can use qword or dword accesses. For qword accesses, the IRQ Source calculates the address of the <m th > bit using the following formulas:
qword address = <PBA base addr> + 8(floor(<m>/64)) qword bit = <m> mod 64
Figure 90. MSI-X PBA Table -
The IRQ Processor reads the entry in the MSI-X table.
- If the interrupt is masked by the Vector_Control field of the MSI-X table, the interrupt remains in the pending state.
- If the interrupt is not masked, IRQ Processor sends Memory Write Request to the TX slave interface. It uses the address and data from the MSI-X table. If Message Upper Address = 0, the IRQ Processor creates a three-dword header. If the Message Upper Address > 0, it creates a 4-dword header.
- The host interrupt service routine detects the TLP as an interrupt and services it.
8.1.4. Legacy Interrupts
Legacy interrupts mimic the original PCI level-sensitive interrupts using virtual wire messages. The Intel® Arria® 10 or Intel® Cyclone® 10 GX signals legacy interrupts on the PCIe link using Message TLPs. The term, INTx, refers collectively to the four legacy interrupts, INTA#, INTB#, INTC# and INTD#. The Intel® Arria® 10 or Intel® Cyclone® 10 GX asserts app_int_sts to cause an Assert_INTx Message TLP to be generated and sent upstream. Deassertion of app_int_sts causes a Deassert_INTx Message TLP to be generated and sent upstream. To use legacy interrupts, you must clear the Interrupt Disable bit, which is bit 10 of the Command register. Then, turn off the MSI Enable bit.
The following figures illustrates interrupt timing for the legacy interface. The legacy interrupt handler asserts app_int_sts to instruct the IP to send a Assert_INTx message TLP.
The following figure illustrates the timing for deassertion of legacy interrupts. The legacy interrupt handler asserts app_int_sts causing the IP to send a Deassert_INTx message.
8.2. Interrupts for Root Ports
In Root Port mode, the Intel® Arria® 10 or Intel® Cyclone® 10 GX Hard IP for PCI Express receives interrupts through two different mechanisms:
- MSI—Root Ports receive MSI interrupts through the Avalon-ST RX Memory Write TLP. This is a memory mapped mechanism.
- Legacy—Legacy interrupts are translated into Message Interrupt TLPs and sent to the Application Layer using the int_status pins.
Normally, the Root Port services rather than sends interrupts; however, in two circumstances the Root Port can send an interrupt to itself to record error conditions:
- When the AER option is enabled, the aer_msi_num[4:0] signal indicates which MSI is being sent to the root complex when an error is logged in the AER Capability structure. This mechanism is an alternative to using the serr_out signal. The aer_msi_n um[4:0] is only used for Root Ports and you must set it to a constant value. It cannot toggle during operation.
- If the Root Port detects a Power Management Event, the pex_msi_num[4:0] signal is used by Power Management or Hot Plug to determine the offset between the base message interrupt number and the message interrupt number to send through MSI. The user must set pex_msi_num[4:0]to a fixed value.
The Root Error Status register reports the status of error messages. The Root Error Status register is part of the PCI Express AER Extended Capability structure. It is located at offset 0x830 of the Configuration Space registers.
9. Error Handling
Each PCI Express compliant device must implement a basic level of error management and can optionally implement advanced error management. The IP core implements both basic and advanced error reporting. Error handling for a Root Port is more complex than that of an Endpoint.
Type |
Responsible Agent |
Description |
---|---|---|
Correctable |
Hardware |
While correctable errors may affect system performance, data integrity is maintained. |
Uncorrectable, non-fatal |
Device software |
Uncorrectable, non-fatal errors are defined as errors in which data is lost, but system integrity is maintained. For example, the fabric may lose a particular TLP, but it still works without problems. |
Uncorrectable, fatal |
System software |
Errors generated by a loss of data and system failure are considered uncorrectable and fatal. Software must determine how to handle such errors: whether to reset the link or implement other means to minimize the problem. |
9.1. Physical Layer Errors
Error |
Type |
Description |
---|---|---|
Receive port error |
Correctable |
This error has the following 3 potential causes:
|
9.2. Data Link Layer Errors
Error |
Type |
Description |
---|---|---|
Bad TLP |
Correctable |
This error occurs when a LCRC verification fails or when a sequence number error occurs. |
Bad DLLP |
Correctable |
This error occurs when a CRC verification fails. |
Replay timer |
Correctable |
This error occurs when the replay timer times out. |
Replay num rollover |
Correctable |
This error occurs when the replay number rolls over. |
Data Link Layer protocol |
Uncorrectable(fatal) |
This error occurs when a sequence number specified by the Ack/Nak block in the Data Link Layer (AckNak_Seq_Num) does not correspond to an unacknowledged TLP. |
9.3. Transaction Layer Errors
Error |
Type |
Description |
---|---|---|
Poisoned TLP received |
Uncorrectable (non-fatal) |
This error occurs if a received Transaction Layer Packet has the EP poison bit set. The received TLP is passed to the Application Layer and the Application Layer logic must take appropriate action in response to the poisoned TLP. Refer to “2.7.2.2 Rules for Use of Data Poisoning” in the PCI Express Base Specification for more information about poisoned TLPs. |
ECRC check failed (1) |
Uncorrectable (non-fatal) |
This error is caused by an ECRC check failing despite the fact that the TLP is not malformed and the LCRC check is valid. The Hard IP block handles this TLP automatically. If the TLP is a non‑posted request, the Hard IP block generates a completion with completer abort status. In all cases the TLP is deleted in the Hard IP block and not presented to the Application Layer. |
Unsupported Request for Endpoints |
Uncorrectable (non-fatal) |
This error occurs whenever a component receives any of the following Unsupported Requests:
In all cases the TLP is deleted in the Hard IP block and not presented to the Application Layer. If the TLP is a non-posted request, the Hard IP block generates a completion with Unsupported Request status. |
Unsupported Requests for Root Port |
Uncorrectable (fatal) |
This error occurs whenever a component receives an Unsupported Request including:
|
Completion timeout |
Uncorrectable (non-fatal) |
This error occurs when a request originating from the Application Layer does not generate a corresponding completion TLP within the established time. It is the responsibility of the Application Layer logic to provide the completion timeout mechanism. The completion timeout should be reported from the Transaction Layer using the cpl_err[0] signal. |
Completer abort (1) |
Uncorrectable (non-fatal) |
The Application Layer reports this error using the cpl_err[2]signal when it aborts receipt of a TLP. |
Unexpected completion |
Uncorrectable (non-fatal) |
This error is caused by an unexpected completion transaction. The Hard IP block handles the following conditions:
In all of the above cases, the TLP is not presented to the Application Layer; the Hard IP block deletes it. The Application Layer can detect and report other unexpected completion conditions using the cpl_err[2] signal. For example, the Application Layer can report cases where the total length of the received successful completions do not match the original read request length. |
Receiver overflow (1) |
Uncorrectable (fatal) |
This error occurs when a component receives a TLP that violates the FC credits allocated for this type of TLP. In all cases the hard IP block deletes the TLP and it is not presented to the Application Layer. |
Flow control protocol error (FCPE) (1) |
Uncorrectable (fatal) |
This error occurs when a component does not receive update flow control credits with the 200 µs limit. |
Malformed TLP |
Uncorrectable (fatal) |
This error is caused by any of the following conditions:
The Hard IP block deletes the malformed TLP; it is not presented to the Application Layer. |
Note:
|
9.4. Error Reporting and Data Poisoning
How the Endpoint handles a particular error depends on the configuration registers of the device.
Refer to the PCI Express Base Specification 3.0 for a description of the device signaling and logging for an Endpoint.
The Hard IP block implements data poisoning, a mechanism for indicating that the data associated with a transaction is corrupted. Poisoned TLPs have the error/poisoned bit of the header set to 1 and observe the following rules:
- Received poisoned TLPs are sent to the Application Layer and status bits are automatically updated in the Configuration Space.
- Received poisoned Configuration Write TLPs are not written in the Configuration Space.
- The Configuration Space never generates a poisoned TLP; the error/poisoned bit of the header is always set to 0.
Poisoned TLPs can also set the parity error bits in the PCI Configuration Space Status register.
Status Bit |
Conditions |
---|---|
Detected parity error (status register bit 15) |
Set when any received TLP is poisoned. |
Master data parity error (status register bit 8) |
This bit is set when the command register parity enable bit is set and one of the following conditions is true:
|
Poisoned packets received by the Hard IP block are passed to the Application Layer. Poisoned transmit TLPs are similarly sent to the link.
9.5. Uncorrectable and Correctable Error Status Bits
The following section is reprinted with the permission of PCI-SIG. Copyright 2010 PCI‑SIG.
10. PCI Express Protocol Stack
The Intel® Arria® 10 or Intel® Cyclone® 10 GX Hard IP for PCI Express implements the complete PCI Express protocol stack as defined in the PCI Express Base Specification. The protocol stack includes the following layers:
- Transaction Layer—The Transaction Layer contains the Configuration Space, which manages communication with the Application Layer, the RX and TX channels, the RX buffer, and flow control credits.
-
Data Link Layer—The Data Link Layer, located between the
Physical Layer and the Transaction Layer, manages packet transmission and maintains
data integrity at the link level. Specifically, the Data Link Layer performs the
following tasks:
- Manages transmission and reception of Data Link Layer Packets (DLLPs)
- Generates all transmission cyclical redundancy code (CRC) values and checks all CRCs during reception
- Manages the retry buffer and retry mechanism according to received ACK/NAK Data Link Layer packets
- Initializes the flow control mechanism for DLLPs and routes flow control credits to and from the Transaction Layer
- Physical Layer—The Physical Layer initializes the speed, lane numbering, and lane width of the PCI Express link according to packets received from the link and directives received from higher layers.
The following figure provides a high‑level block diagram.
Lanes |
Gen1 |
Gen2 |
Gen3 |
---|---|---|---|
×1 |
125 MHz @ 64 bits or 62.5 MHz @ 64 bits |
125 MHz @ 64 bits |
125 MHz @64 bits |
×2 |
125 MHz @ 64 bits |
125 MHz @ 128 bits |
250 MHz @ 64 bits or 125 MHz @ 128 bits |
×4 |
125 MHz @ 64 bits |
250 MHz @ 64 bits or 125 MHz @ 128 bits |
250 MHz @ 128 bits or 125 MHz @ 256 bits |
×8 |
250 MHz @ 64 bits or 125 MHz @ 128 bits |
250 MHz @ 128 bits or 125 MHz @ 256 bits |
250 MHz @ 256 bits |
The following interfaces provide access to the Application Layer’s Configuration Space Registers:
- The LMI interface
- The Avalon-MM PCIe reconfiguration interface, which can access any read-only Configuration Space Register
- In Root Port mode, you can also access the Configuration Space Registers with a Configuration TLP using the Avalon-ST interface. A Type 0 Configuration TLP is used to access the Root Port configuration Space Registers, and a Type 1 Configuration TLP is used to access the Configuration Space Registers of downstream components, typically Endpoints on the other side of the link.
The Hard IP includes dedicated clock domain crossing logic (CDC) between the PHYMAC and Data Link Layers.
10.1. Top-Level Interfaces
10.1.1. Avalon-ST Interface
An Avalon‑ST interface connects the Application Layer and the Transaction Layer. This is a point‑to‑point, streaming interface designed for high throughput applications. The Avalon‑ST interface includes the RX and TX datapaths.
For more information about the Avalon‑ST interface, including timing diagrams, refer to the Avalon Interface Specifications.
RX Datapath
The RX datapath transports data from the Transaction Layer to the Application Layer’s Avalon‑ST interface. Masking of non-posted requests is partially supported. Refer to the description of the rx_st_mask signal for further information about masking.
The TX datapath transports data from the Application Layer's Avalon-ST interface to the Transaction Layer. The Hard IP provides credit information to the Application Layer for posted headers, posted data, non‑posted headers, non‑posted data, completion headers and completion data.
The Application Layer may track credits consumed and use the credit limit information to calculate the number of credits available. However, to enforce the PCI Express Flow Control (FC) protocol, the Hard IP also checks the available credits before sending a request to the link, and if the Application Layer violates the available credits for a TLP it transmits, the Hard IP blocks that TLP and all future TLPs until credits become available. By tracking the credit consumed information and calculating the credits available, the Application Layer can optimize performance by selecting for transmission only the TLPs that have credits available.
10.1.2. Clocks and Reset
The PCI Express Base Specification requires an input reference clock, which is called refclk in this design. The PCI Express Base Specification stipulates that the frequency of this clock be 100 MHz.
The PCI Express Base Specification also requires a system configuration time of 100 ms. To meet this specification, IP core includes an embedded hard reset controller. This reset controller exits the reset state after the periphery of the device is initialized.
10.1.3. Local Management Interface (LMI Interface)
The LMI bus provides access to the PCI Express Configuration Space in the Transaction Layer.
10.1.4. Hard IP Reconfiguration
The PCI Express reconfiguration bus allows you to dynamically change the read-only values stored in the Configuration Registers.
10.1.5. Interrupts
The Hard IP for PCI Express offers the following interrupt mechanisms:
- Message Signaled Interrupts (MSI)— MSI uses the TLP single dword memory writes to to implement interrupts. This interrupt mechanism conserves pins because it does not use separate wires for interrupts. In addition, the single dword provides flexibility in data presented in the interrupt message. The MSI Capability structure is stored in the Configuration Space and is programmed using Configuration Space accesses.
- MSI-X—The Transaction Layer generates MSI-X messages which are single dword memory writes. The MSI-X Capability structure points to an MSI-X table structure and MSI-X PBA structure which are stored in memory. This scheme is in contrast to the MSI capability structure, which contains all of the control and status information for the interrupt vectors.
- Legacy interrupts—The app_int_sts port controls legacy interrupt generation. When app_int_sts is asserted, the Hard IP generates an Assert_INT<n> message TLP.
10.1.6. PIPE
The PIPE interface implements the Intel‑designed PIPE interface specification. You can use this parallel interface to speed simulation; however, you cannot use the PIPE interface in actual hardware.
- The simulation models support PIPE and serial simulation.
- For Gen3, the Intel BFM bypasses Gen3 Phase 2 and Phase 3 Equalization. However, Gen3 variants can perform Phase 2 and Phase 3 equalization if instructed by a third-party BFM.
10.2. Transaction Layer
The Transaction Layer is located between the Application Layer and the Data Link Layer. It generates and receives Transaction Layer Packets. The following illustrates the Transaction Layer. The Transaction Layer includes three sub-blocks: the TX datapath, Configuration Space, and RX datapath.
Tracing a transaction through the RX datapath includes the following steps:
- The Transaction Layer receives a TLP from the Data Link Layer.
- The Configuration Space determines whether the TLP is well formed and directs the packet based on traffic class (TC).
- TLPs are stored in a specific part of the RX buffer depending on the type of transaction (posted, non-posted, and completion).
- The TLP FIFO block stores the address of the buffered TLP.
- The receive reordering block reorders the queue of TLPs as needed, fetches the address of the highest priority TLP from the TLP FIFO block, and initiates the transfer of the TLP to the Application Layer.
- When ECRC generation and forwarding are enabled, the Transaction Layer forwards the ECRC DWORD to the Application Layer.
Tracing a transaction through the TX datapath involves the following steps:
- The Transaction Layer informs the Application Layer that sufficient flow control credits exist for a particular type of transaction using the TX credit signals. The Application Layer may choose to ignore this information.
- The Application Layer requests permission to transmit a TLP. The Application Layer must provide the transaction and must be prepared to provide the entire data payload in consecutive cycles.
- The Transaction Layer verifies that sufficient flow control credits exist and acknowledges or postpones the request. If there is insufficient space in the retry buffer, the Transaction Layer does not accept the TLP.
- The Transaction Layer forwards the TLP to the Data Link Layer.
10.2.1. Configuration Space
The Configuration Space implements the following configuration registers and associated functions:
- Header Type 0 Configuration Space for Endpoints
- Header Type 1 Configuration Space for Root Ports
- PCI Power Management Capability Structure
- Virtual Channel Capability Structure
- Message Signaled Interrupt (MSI) Capability Structure
- Message Signaled Interrupt–X (MSI–X) Capability Structure
- PCI Express Capability Structure
- Advanced Error Reporting (AER) Capability Structure
- Vendor Specific Extended Capability (VSEC)
The Configuration Space also generates all messages (PME#, INT, error, slot power limit), MSI requests, and completion packets from configuration requests that flow in the direction of the root complex, except slot power limit messages, which are generated by a downstream port. All such transactions are dependent upon the content of the PCI Express Configuration Space as described in the PCI Express Base Specification.
10.3. Data Link Layer
The Data Link Layer is located between the Transaction Layer and the Physical Layer. It maintains packet integrity and communicates (by DLL packet transmission) at the PCI Express link level.
The DLL implements the following functions:
- Link management through the reception
and transmission of DLL
Packets
(DLLP), which are used for the following functions:
- Power management of DLLP reception and transmission
- To transmit and receive ACK/NAK packets
- Data integrity through generation and checking of CRCs for TLPs and DLLPs
- TLP retransmission in case of NAK DLLP reception or replay timeout, using the retry (replay) buffer
- Management of the retry buffer
- Link retraining requests in case of error through the Link Training and Status State Machine (LTSSM) of the Physical Layer
The DLL has the following sub-blocks:
- Data Link Control and Management State Machine—This state machine connects to both the Physical Layer’s LTSSM state machine and the Transaction Layer. It initializes the link and flow control credits and reports status to the Transaction Layer.
- Power Management—This function handles the handshake to enter low power mode. Such a transition is based on register values in the Configuration Space and received Power Management (PM) DLLPs. All of the Intel® Arria® 10 or Intel® Cyclone® 10 GX Hard IP for PCIe IP core variants do not support low power modes.
- Data Link Layer Packet Generator and Checker—This block is associated with the DLLP’s 16-bit CRC and maintains the integrity of transmitted packets.
- Transaction Layer Packet Generator—This block generates transmit packets, including a sequence number and a 32-bit Link CRC (LCRC). The packets are also sent to the retry buffer for internal storage. In retry mode, the TLP generator receives the packets from the retry buffer and generates the CRC for the transmit packet.
- Retry Buffer—The retry buffer stores TLPs and retransmits all unacknowledged packets in the case of NAK DLLP reception. In case of ACK DLLP reception, the retry buffer discards all acknowledged packets.
- ACK/NAK Packets—The ACK/NAK block handles ACK/NAK DLLPs and generates the sequence number of transmitted packets.
- Transaction Layer Packet Checker—This block checks the integrity of the received TLP and generates a request for transmission of an ACK/NAK DLLP.
- TX Arbitration—This block
arbitrates transactions, prioritizing in the following order:
- Initialize FC Data Link Layer packet
- ACK/NAK DLLP (high priority)
- Update FC DLLP (high priority)
- PM DLLP
- Retry buffer TLP
- TLP
- Update FC DLLP (low priority)
- ACK/NAK FC DLLP (low priority)
10.4. Physical Layer
The Physical Layer is the lowest level of the PCI Express protocol stack. It is the layer closest to the serial link. It encodes and transmits packets across a link and accepts and decodes received packets. The Physical Layer connects to the link through a high‑speed SERDES interface running at 2.5 Gbps for Gen1 implementations, at 2.5 or 5.0 Gbps for Gen2 implementations, and at 2.5, 5.0 or 8.0 Gbps for Gen3 implementations.
The Physical Layer is responsible for the following actions:
- Training the link
- Scrambling/descrambling and 8B/10B encoding/decoding for 2.5 Gbps (Gen1), 5.0 Gbps (Gen2), or 128b/130b encoding/decoding of 8.0 Gbps (Gen3) per lane
- Scrambling/descrambling and 8B/10B encoding/decoding for 2.5 Gbps (Gen1) and 5.0 Gbps (Gen2) per lane
- Serializing and deserializing data
- Equalization (Gen3)
- Operating the PIPE 3.0 Interface
- Implementing auto speed negotiation (Gen2 and Gen3)
- Implementing auto speed negotiation (Gen2)
- Transmitting and decoding the training sequence
- Providing hardware autonomous speed control
- Implementing auto lane reversal
PHY Layer—The PHY layer includes the 8B/10B encode and decode functions for Gen1 and Gen2. The PHY also includes elastic buffering and serialization/deserialization functions.
The Physical Layer is subdivided by the PIPE Interface Specification into two layers (bracketed horizontally in above figure):
- PHYMAC—The MAC layer includes the LTSSM and the scrambling/descrambling. byte reordering, and multilane deskew functions.
- Media Access Controller (MAC) Layer—The MAC layer includes the LTSSM and the scrambling and descrambling and multilane deskew functions.
- PHY Layer—The PHY layer includes the 8B/10B encode and decode functions for Gen1 and Gen2. It includes 128b/130b encode and decode functions for Gen3. The PHY also includes elastic buffering and serialization/deserialization functions.
- PHY Layer—The PHY layer includes the 8B/10B encode and decode functions for Gen1 and Gen2. The PHY also includes elastic buffering and serialization/deserialization functions.
The PHYMAC block comprises four main sub-blocks:
- MAC Lane—Both the RX and
the TX path use this block.
- On the RX side, the block decodes the Physical Layer packet and reports to the LTSSM the type and number of TS1/TS2 ordered sets received.
- On the TX side, the block multiplexes data from the DLL and the Ordered Set and SKP sub-block (LTSTX). It also adds lane specific information, including the lane number and the force PAD value when the LTSSM disables the lane during initialization.
- LTSSM—This block implements the LTSSM and logic that tracks TX and RX training sequences on each lane.
- For transmission, it
interacts with each MAC lane sub-block and with the LTSTX sub-block by asserting
both global and per-lane control bits to generate specific Physical Layer packets.
- On the receive path, it receives the Physical Layer packets reported by each MAC lane sub-block. It also enables the multilane deskew block. This block reports the Physical Layer status to higher layers.
- LTSTX (Ordered Set and SKP Generation)—This sub-block generates the Physical Layer packet. It receives control signals from the LTSSM block and generates Physical Layer packet for each lane. It generates the same Physical Layer Packet for all lanes and PAD symbols for the link or lane number in the corresponding TS1/TS2 fields. The block also handles the receiver detection operation to the PCS sub-layer by asserting predefined PIPE signals and waiting for the result. It also generates a SKP Ordered Set at every predefined timeslot and interacts with the TX alignment block to prevent the insertion of a SKP Ordered Set in the middle of packet.
- Deskew—This sub-block performs the multilane deskew function and the RX alignment between the initialized lanes and the datapath. The multilane deskew implements an eight-word FIFO buffer for each lane to store symbols. Each symbol includes eight data bits, one disparity bit, and one control bit. The FIFO discards the FTS, COM, and SKP symbols and replaces PAD and IDL with D0.0 data. When all eight FIFOs contain data, a read can occur. When the multilane lane deskew block is first enabled, each FIFO begins writing after the first COM is detected. If all lanes have not detected a COM symbol after seven clock cycles, they are reset and the resynchronization process restarts, or else the RX alignment function recreates a 64-bit data word which is sent to the DLL.
11. Transaction Layer Protocol (TLP) Details
11.1. Supported Message Types
11.1.1. INTX Messages
Message |
Root Port |
Endpoint |
Generated by |
Comments |
||
---|---|---|---|---|---|---|
App Layer |
Core |
Core (with App Layer input) |
||||
Assert_INTA |
Receive |
Transmit |
No |
Yes |
No |
For Root Port, legacy interrupts are translated into message interrupt TLPs which triggers the int_status[3:0] signals to the Application Layer.
|
Assert_INTB |
Receive |
Transmit |
No |
No |
No |
|
Assert_INTC |
Receive |
Transmit |
No |
No |
No |
|
Assert_INTD |
Receive |
Transmit |
No |
No |
No |
|
Deassert_INTA |
Receive |
Transmit |
No |
Yes |
No |
|
Deassert_INTB |
Receive |
Transmit |
No |
No |
No |
|
Deassert_INTC |
Receive |
Transmit |
No |
No |
No |
|
Deassert_INTD |
Receive |
Transmit |
No |
No |
No |
11.1.2. Power Management Messages
Message |
Root Port |
Endpoint |
Generated by |
Comments |
||
---|---|---|---|---|---|---|
App Layer |
Core |
Core (with App Layer input) |
||||
PM_Active_State_Nak |
TX |
RX |
No |
Yes |
No |
— |
PM_PME |
RX |
TX |
No |
No |
Yes |
— |
PME_Turn_Off |
TX |
RX |
No |
No |
Yes |
The pme_to_cr signal sends and acknowledges this message:
|
PME_TO_Ack |
RX |
TX |
No |
No |
Yes |
— |
11.1.3. Error Signaling Messages
Message |
Root Port |
Endpoint |
Generated by |
Comments |
||
---|---|---|---|---|---|---|
App Layer |
Core |
Core (with App Layer input) |
||||
ERR_COR |
RX |
TX |
No |
Yes |
No |
In addition to detecting errors, a Root Port also gathers and manages errors sent by downstream components through the ERR_COR, ERR_NONFATAL, AND ERR_FATAL Error Messages. In Root Port mode, there are two mechanisms to report an error event to the Application Layer:
|
ERR_NONFATAL |
RX |
TX |
No |
Yes |
No |
— |
ERR_FATAL |
RX |
TX |
No |
Yes |
No |
— |
11.1.4. Locked Transaction Message
Message |
Root Port |
Endpoint |
Generated by |
Comments |
||
---|---|---|---|---|---|---|
App Layer |
Core |
Core (with App Layer input) |
||||
Unlock Message |
Transmit |
Receive |
Yes |
No |
No |
11.1.5. Slot Power Limit Message
The PCI Express Base Specification Revision states that this message is not mandatory after link training.
Message |
Root Port |
Endpoint |
Generated by |
Comments |
||
---|---|---|---|---|---|---|
App Layer |
Core |
Core (with App Layer input) |
||||
Set Slot Power Limit |
Transmit |
Receive |
No |
Yes |
No |
In Root Port mode, through software. |
11.1.6. Vendor-Defined Messages
Message |
Root Port |
Endpoint |
Generated by |
Comments |
||
---|---|---|---|---|---|---|
App Layer |
Core |
Core (with App Layer input) |
||||
Vendor Defined Type 0 |
Transmit Receive |
Transmit Receive |
Yes |
No |
No |
|
Vendor Defined Type 1 |
Transmit Receive |
Transmit Receive |
Yes |
No |
No |
|
11.1.7. Hot Plug Messages
Message |
Root Port |
Endpoint |
Generated by |
Comments |
||
---|---|---|---|---|---|---|
App Layer |
Core |
Core (with App Layer input) |
||||
Attention_indicator On |
Transmit |
Receive |
No |
Yes |
No |
Per the recommendations in the PCI Express Base Specification Revision , these messages are not transmitted to the Application Layer. |
Attention_Indicator Blink |
Transmit |
Receive |
No |
Yes |
No |
|
Attention_indicator Off |
Transmit |
Receive |
No |
Yes |
No |
|
Power_Indicator On |
Transmit |
Receive |
No |
Yes |
No |
|
Power_Indicator Blink |
Transmit |
Receive |
No |
Yes |
No |
|
Power_Indicator Off |
Transmit |
Receive |
No |
Yes |
No |
|
Attention Button_Pressed (Endpoint only) |
Receive |
Transmit |
No |
No |
Yes |
N/A |
11.2. Transaction Layer Routing Rules
Transactions adhere to the following routing rules:
- In the receive direction (from the PCI Express link), memory and I/O requests that match the defined base address register (BAR) contents and vendor-defined messages with or without data route to the receive interface. The Application Layer logic processes the requests and generates the read completions, if needed.
- In Endpoint mode, received Type 0 Configuration requests from the PCI Express upstream port route to the internal Configuration Space and the Intel® Arria® 10 or Intel® Cyclone® 10 GX Hard IP for PCI Express generates and transmits the completion.
- The Hard IP handles supported received message transactions (Power Management and Slot Power Limit) internally. The Endpoint also supports the Unlock and Type 1 Messages. The Root Port supports Interrupt, Type 1, and error Messages.
- Vendor‑defined Type 0 and Type 1 Message TLPs are passed to the Application Layer.
- The Transaction Layer treats all other received transactions (including memory or I/O requests that do not match a defined BAR) as Unsupported Requests. The Transaction Layer sets the appropriate error bits and transmits a completion, if needed. These Unsupported Requests are not made visible to the Application Layer; the header and data are dropped.
- For memory read and write request with addresses below 4 GB, requestors must use the 32-bit format. The Transaction Layer interprets requests using the 64‑bit format for addresses below 4 GB as an Unsupported Request and does not send them to the Application Layer. If Error Messaging is enabled, an error Message TLP is sent to the Root Port. Refer to Transaction Layer Errors for a comprehensive list of TLPs the Hard IP does not forward to the Application Layer.
- The Transaction Layer sends all memory and I/O requests, as well as completions generated by the Application Layer and passed to the transmit interface, to the PCI Express link.
- The Hard IP can generate and transmit power management, interrupt, and error signaling messages automatically under the control of dedicated signals. Additionally, it can generate MSI requests under the control of the dedicated signals.
- In Root Port mode, the Application Layer can issue Type 0 or Type 1 Configuration TLPs on the Avalon-ST TX bus.
- The Type 0 Configuration TLPs are only routed to the Configuration Space of the Hard IP and are not sent downstream on the PCI Express link.
- The Type 1 Configuration TLPs are sent downstream on the PCI Express link. If the bus number of the Type 1 Configuration TLP matches the Secondary Bus Number register value in the Root Port Configuration Space, the TLP is converted to a Type 0 TLP.
- For more information about routing rules in Root Port mode, refer to Section 7.3.3 Configuration Request Routing Rules in the PCI Express Base Specification .
11.3. Receive Buffer Reordering
The PCI, PCI-X and PCI Express protocols include ordering rules for concurrent TLPs. Ordering rules are necessary for the following reasons:
- To guarantee that TLPs complete in the intended order
- To avoid deadlock
- To maintain computability with ordering used on legacy buses
- To maximize performance and throughput by minimizing read latencies and managing read/write ordering
- To avoid race conditions in systems that include legacy PCI buses by guaranteeing that reads to an address do not complete before an earlier write to the same address
PCI uses a strongly-ordered model with some exceptions to avoid potential deadlock conditions. PCI-X added a relaxed ordering (RO) bit in the TLP header. It is bit 5 of byte 2 in the TLP header, or the high-order bit of the attributes field in the TLP header. If this bit is set, relaxed ordering is permitted. If software can guarantee that no dependencies exist between pending transactions, you can safely set the relaxed ordering bit.
The following table summarizes the ordering rules from the PCI specification. In this table, the entries have the following meanings:
- Columns represent the first transaction issued.
- Rows represent the next transaction.
- At each intersection, the
implicit question is: should this row packet be allowed to pass the column
packet? The following three answers are possible:
- Yes: the second transaction must be allowed to pass the first to avoid deadlock.
- Y/N: There are no requirements. A device may allow the second transaction to pass the first.
- No: The second transaction must not be allowed to pass the first.
The following transaction ordering rules apply to the table below.
- A Memory Write or Message Request with the Relaxed Ordering Attribute bit clear (b’0) must not pass any other Memory Write or Message Request.
- A Memory Write or Message Request with the Relaxed Ordering Attribute bit set (b’1) is permitted to pass any other Memory Write or Message Request.
- Endpoints, Switches, and Root Complex may allow Memory Write and Message Requests to pass Completions or be blocked by Completions.
- Memory Write and Message Requests can pass Completions traveling in the PCI Express to PCI directions to avoid deadlock.
- If the Relaxed Ordering attribute is not set, then a Read Completion cannot pass a previously enqueued Memory Write or Message Request.
- If the Relaxed Ordering attribute is set, then a Read Completion is permitted to pass a previously enqueued Memory Write or Message Request.
- Read Completion associated with different Read Requests are allowed to be blocked by or to pass each other.
- Read Completions for Request (same Transaction ID) must return in address order.
- Non-posted requests cannot pass other non-posted requests.
- CfgRd0CfgRd0 can pass IORd or MRd.
- CfgWr0 can IORd or MRd.
- CfgRd0 can pass IORd or MRd.
- CfrWr0 can pass IOWr.
Can the Row Pass the Column? |
Posted Req |
Non Posted Req |
Completion |
||||||
---|---|---|---|---|---|---|---|---|---|
Memory Write or Message Req |
Read Request |
I/O or Cfg Write Req |
|||||||
|
Spec |
Hard IP |
Spec |
Hard IP |
Spec |
Hard IP |
Spec |
Hard IP |
|
P |
Posted Req |
No Y/N |
No No |
Yes |
Yes |
Yes |
Yes |
Y/N Yes |
No No |
NP |
Read Req |
No |
No |
Y/N |
No |
Y/N |
No |
Y/N |
No |
Non-Posted Req with data |
No |
No |
Y/N |
No |
Y/N |
No |
Y/N |
No |
|
Cmpl |
Cmpl |
No Y/N |
No No |
Yes |
Yes |
Yes |
Yes |
Y/N No |
No No |
I/O or Configuration Write Cmpl |
Y/N |
No |
Yes |
Yes |
Yes |
Yes |
Y/N |
No |
As the table above indicates, the RX datapath implements an RX buffer reordering function that allows Posted and Completion transactions to pass Non-Posted transactions (as allowed by PCI Express ordering rules) when the Application Layer is unable to accept additional Non-Posted transactions.
The Application Layer dynamically enables the RX buffer reordering by asserting the rx_mask signal. The rx_mask signal blocks non-posted Req transactions made to the Application Layer interface so that only posted and completion transactions are presented to the Application Layer.
11.3.1. Using Relaxed Ordering
Transactions from unrelated threads are unlikely to have data dependencies. Consequently, you may be able to use relaxed ordering to improve system performance. The drawback is that only some transactions can be optimized for performance. Complete the following steps to decide whether to enable relaxed ordering in your design:
- Create a system diagram showing all PCI Express and legacy devices.
- Analyze the relationships between the
components in your design to identify the following hazards:
- Race conditions: A
race condition exists if a read to a location can occur before a previous
write to that location completes. The following figure shows a data producer
and data consumer on opposite sides of a PCI-to-PCI bridge. The producer
writes data to the memory through a PCI-to-PCI bridge. The consumer must
read a flag to confirm the producer has written the new data into the memory
before reading the data. However, because the PCI-to-PCI bridge includes a
write buffer, the flag may indicate that it is safe to read data while the
actual data remains in the PCI-to-PCI bridge posted write buffer. Figure 100. Design Including Legacy PCI Buses Requiring Strong Ordering
- A shared memory architecture where more than one thread accesses the same locations in memory.
If either of these conditions exists, relaxed ordering leads to incorrect results.
- Race conditions: A
race condition exists if a read to a location can occur before a previous
write to that location completes. The following figure shows a data producer
and data consumer on opposite sides of a PCI-to-PCI bridge. The producer
writes data to the memory through a PCI-to-PCI bridge. The consumer must
read a flag to confirm the producer has written the new data into the memory
before reading the data. However, because the PCI-to-PCI bridge includes a
write buffer, the flag may indicate that it is safe to read data while the
actual data remains in the PCI-to-PCI bridge posted write buffer.
- If your analysis determines that relaxed ordering does not lead to possible race conditions or read or write hazards, you can enable relaxed ordering by setting the RO bit in the TLP header.
- The following figure shows
two PCIe Endpoints and Legacy Endpoint connected to a switch. The three PCIe
Endpoints are not likely to have data dependencies. Consequently, it would be
safe to set the relaxed ordering bit for devices connected to the switch. In
this system, if relax ordering is not enabled, a memory read to the legacy
Endpoint is blocked. The legacy Endpoint read is blocked because an earlier
posted write cannot be completed as the write buffer is full. .
Figure 101. PCI Express Design Using Relaxed Ordering
- If your analysis indicates that you can enable relaxed ordering, simulate your system with and without relaxed ordering enabled. Compare the results and performance.
- If relaxed ordering improves performance without introducing errors, you can enable it in your system.
12. Throughput Optimization
Each transmitter, the write requester in this case, maintains a credit limit register and a credits consumed register. The credit limit register is the sum of all credits received by the receiver, the write completer in this case. The credit limit register is initialized during the flow control initialization phase of link initialization and then updated during operation by Flow Control (FC) Update DLLPs. The credits consumed register is the sum of all credits consumed by packets transmitted. Separate credit limit and credits consumed registers exist for each of the six types of Flow Control:
- Posted Headers
- Posted Data
- Non-Posted Headers
- Non-Posted Data
- Completion Headers
- Completion Data
Each receiver also maintains a credit allocated counter which is initialized to the total available space in the RX buffer (for the specific Flow Control class) and then incremented as packets are pulled out of the RX buffer by the Application Layer. The value of this register is sent as the FC Update DLLP value.
The PCIe Hard IP maintains its own flow control logic, including a credit consumed register, and ensures that no TLP is sent that would use more credits than are available for that type of TLP. If you want optimum performance and granularity, you can maintain your own credit consumed register and flow control gating logic for each credit category (Header/Data, Posted/Non-posted/Completion). This allows you to halt the transmission of TLPs for a category that is out of credits, while still allowing TLP transmission for categories that have sufficient credits.
The following steps describe the Flow Control Update loop. The corresponding numbers in the figure show the general area to which they correspond.
- When the Application Layer has a packet to transmit, the number of credits required is calculated. If the required credits are less than or equal to the current value of available credits (credit limit - credits consumed so far), then the packet can be transmitted immediately. However, if the credit limit minus credits consumed is less than the required credits, then the packet must be held until the credit limit is increased to a sufficient value by an FC Update DLLP. This check is performed separately for the header and data credits; a single packet consumes only a single header credit.
- After the packet is selected for transmission the credits consumed register is incremented by the number of credits consumed by this packet. This increment happens for both the header and data credit consumed registers.
- The packet is received at the other end of the link and placed in the RX buffer.
- At some point the packet is read out of the RX buffer by the Application Layer. After the entire packet is read out of the RX buffer, the credit allocated register can be incremented by the number of credits the packet has used. There are separate credit allocated registers for the header and data credits.
- The value in the credit allocated register is used to create an FC Update DLLP.
- After an FC Update DLLP is
created, it arbitrates for access to the PCI Express link. The FC Update DLLPs are
typically scheduled with a low priority; consequently, a continuous stream of
Application Layer TLPs or other DLLPs (such as ACKs) can delay the FC Update DLLP
for a long time. To prevent starving the attached transmitter, FC Update DLLPs are
raised to a high priority under the following three circumstances:
- When the last sent credit allocated counter minus the amount of received data is less than MAX_PAYLOAD and the current credit allocated counter is greater than the last sent credit counter. Essentially, this means the data sink knows the data source has less than a full MAX_PAYLOAD worth of credits, and therefore is starving.
- When an internal timer expires from the time the last FC Update DLLP was sent, which is configured to 30 µs to meet the PCI Express Base Specification for resending FC Update DLLPs.
- When the credit allocated counter minus the last sent credit allocated counter is greater than or e