Intel® Acceleration Stack for Intel® Xeon® CPU with FPGAs 2.0.1
1. About this Document
1.1. About this Document
This reference manual introduces the IFPGA Rawdev Driver (ifpga_rawdev) that is available for the
Intel® FPGA PAC N3000. It introduces the data structures and API functions necessary to
Intel® FPGA PAC N3000 and the
Arria® 10 FPGA.
The intended audience for this reference manual is software engineers
interested in using and customizing this IFPGA Rawdev Driver. Refer to the existing DPDK
documentation for information about other Data Plane Develoment Kit (DPDK) functionality.
1.2. Acronym List
Intel® FPGA PAC
Intel FPGA Programmable Acceleration
Intel FPGA PAC N3000 is a full-duplex 100
Gbps in-system re-programmable acceleration card for
multiworkload networking application acceleration.
Accelerator Functional Unit
Hardware Accelerator implemented in FPGA
logic which offloads a computational operation for an
application from the CPU to improve performance.
Compiled Hardware Accelerator image
implemented in FPGA logic that accelerates an
Application Programming Interface
A set of subroutine definitions, protocols,
and tools for building software applications.
Data Plane Develoment Kit
The Data Plane Development Kit consists of libraries to
accelerate packet processing workloads running on many CPU
architectures, including x86, POWER and ARM processors. DPDK
runs mostly on Linux with a FreeBSD port available for a subset
of DPDK features. The Open Source BSD LicenseDPDK licenses
FPGA Interface Unit
is a platform interface layer that acts as a bridge between
platform interfaces like
and AFU-side interfaces such as CCI-P.
Open Programmable Acceleration
The OPAE is a set of drivers, utilities, and API's
for managing and accessing AFs.
The Intel FPGA Programmable Acceleration Card D5005 consists of two
QSFP28 networking ports that can be configured for
operation per port.
Figure 1. Block Diagram: Network Port Feature
The FPGA Interface Manager (FIM) instantiates two
Stratix® 10 FPGA Transceiver Native PHY IP cores, one for each
QSFP28 network port. The Native PHY IP cores are configured with four transceiver channels,
enabling the Accelerator Function (AF) to instantiate an Accelerator Functional Unit (AFU) with up to 8x
PRBS Generators and Verifiers, and
Reset Controller IP
The Reset Controller IP core
analog and digital reset signaling for each transceiver channel, as required by the
Stratix® 10 Native PHY IP core. In a real use case, along with a Reset
Controller IP core, you
the 8x10G PCS and MAC IPs, as well as your user logic in the AF. The raw PHY parallel data interfaces are exposed to the Partial
Reconfiguration (PR) boundary through the PR HSSI Interface. The raw PHY interface consists of
80-bit parallel data per transmit or receive direction in each transceiver, along with some
sideband signals for handshaking with the Reset Controller IP core across the PR boundary.
The FIM also contains a set of PLLs
for each network port. The PLLs provide all the necessary clocks for the transceivers and the
AFU. The Memory-Mapped (MM) controllers instantiated in the FIM provide the ability for the software driver to have full access to the
Avalon-MM Reconfiguration Interface of the Native PHY IPs through the FPGA Management Engine
Note: The FIM contains only the Hard PHY in PCS-Direct mode (PMA-only). You implement
your own PCS and MAC IP core in the AFU.
Table 1. Correspondence Between Acceleration Stack, FIM, and OPAE Versions
The FIM instantiates two PLLs that
use a 644.53125MHz external reference clock to generate the necessary clocks for the Native
PHY IP core and the AFU. The ATX PLL generates the
high-speed serial clock for the Native PHY IP core. The fPLL generates two clocks,
322.265625MHz and 161.1328125MHz. Both of the fPLL clocks and RX clocks from the Native PHY IP
core are provided to the AFU through the PR HSSI
Figure 2. Logical View of the HSSI PHY
2.2. Physical View
This section depicts the hardware view of a single transceiver channel and its
sub-components as part of the Native PHY IP core. The Native PHY IP core is optimized for the
lowest roundtrip latency. The Native PHY IP TX/RX PCS-Core Interface FIFOs are configured as
Both QSFP28 Port-0
and Port-1 TX FIFOs are in Phase Compensation mode such that TX clocks
can be shared across all 4 channels per QSFP28 interface.
QSFP28 Port-0 RX
FIFO is in Phase Compensation mode.
QSFP28 Port-1 RX
FIFO is in Register mode (bypassed).
Note: In the following
the PCS, MAC, and User Logic blocks under AF are
shown for illustration. These blocks are not provided by Intel as part of the AFU. Intel only provides an example AFU with 8xPRBS Generators and
Figure 3. Physical View with QSFP28 Port-0
Figure 4. Physical
View with QSFP28
2.3. Clock Architecture
This section describes the clocking architecture of the Native PHY IP
All four channels on the TX parallel data interface are clocked by
f2a_tx_parallel_clk_x2 clock, per QSFP28
interface. Each one of the four channels on the RX parallel data interface is
clocked by its own corresponding f2a_rx_clkout[n]
clock, per QSFP28 interface.
On both the QSFP28 ports, tx_clkout[n] interfaces of the Native PHY IP core have no connection
(NC) because the TX FIFO is in Phase Compensation mode and the f2a_tx_parallel_clk_x2 clock is used to drive the
On the QSFP28 Port-0, rx_coreclkin[n] interfaces of the Native PHY IP core are connected to
rx_clkout[n] interfaces because the RX FIFO is
in Phase Compensation mode.
On the QSFP28 Port-1, rx_coreclkin[n] interfaces of the Native PHY IP core are connected to
ground because the RX FIFO is in Register mode.
Figure 5. Clocking Architecture with QSFP28 Port-0
Figure 6. Clocking Architecture with QSFP28 Port-1
Table 2. Clock Frequencies
Frequency in MHz
You can access the following clocks from AF:
The refclk644, external reference clocks , come from
different sources for each QSFP28 network port. Therefore, the relationship
between any given clock on network port 0 is asynchronous to any given clock
on network port 1.
The f2a_tx_parallel_clk_x1 and
f2a_tx_parallel_clk_x2 are phase synchronous for a
given QSFP28 network port.
The rx_clkout[n] clocks are recovered by the Clock and Data
Recovery (CDR) unit in the receiver of each channel. All the
rx_clkout[n] clocks are asynchronous to one
3. Partial Reconfiguration HSSI Interface
The Partial Reconfiguration (PR) HSSI interface is
a unified data
interface that connects a network port to the PRBS Generators and Verifiers. The unified data
interface consists of a fixed set of physical ports that are mapped to specific signaling
functions. The PR HSSI interface also provides clocks for synchronization as well as control
and status signals for analog and digital reset sequence orchestration between the PHY in
FIM and the reset controller IP core in AF. The figure below
provides a high-level block diagram for one QSFP instance.
Figure 7. PR HSSI Block Diagram
3.1. Clock Signals
The clocks of the PR HSSI Interface synchronize the unified data interface
between the PRBS Generators and Verifiers, and the HSSI PHY. The signal directions
listed for HSSI ports are from the perspective of the FIM. The signals listed below are identical for both QSFP28
Table 3. Clock Signals
A 161.1328125 MHz clock generated by an fPLL
in the HSSI PHY from a 644.53125 MHz QSFP28 external reference
clock. This clock is intended to drive the user logic in the
A 322.265625 MHz clock generated by an fPLL
in the HSSI PHY from a 644.53125 MHz QSFP28 external reference
clock. This clock drives the tx_coreclkin inputs of all 4 channels of the
Native PHY IP core. All transmit data from AFU to HSSI PHY
should be synchronous to f2a_tx_parallel_clk_x2.
A 322.265625 MHz clock at the output of the
Native PHY IP core rx_clkout[n]
interface. All receive data to the PRBS Verifiers from the HSSI
PHY is synchronous to f2a_rx_clkout[n], per transceiver channel n.
3.2. Data Interface and Signals
The HSSI unified data interface conforms to the
Stratix® 10 FPGA Transceiver Native PHY IP core configured in
32-bit PCS-Direct mode. It consists of generic parallel data and encoding control
interfaces for transmit and receive that are mapped to specific signaling behavior
as outlined in the
Stratix® 10 L-
and H-Tile Transceiver PHY User Guide. The unified data interface also
includes flow control ports to manage passing data to and from the HSSI PHY
The table below provides a cross reference from the hssi:raw_pr unified data interface signals to the
Stratix® 10 FPGA Transceiver Native PHY IP core with
enhanced PCS signal set. The HSSI PHY IP is configured in Configuration-32, PMA
width-32, FPGA Fabric width-32. The TX Core FIFO is configured in Phase Compensation
is configured in Phase Compensation mode and
QSFP1 is configured in Register mode. The Simplified Data
Interface is disabled. The Double-Rate Transfer is disabled. For detailed
information on these signals, refer to the Intel Stratix 10 L-
and H-Tile Transceiver PHY User Guide.
Table 4. Data Signals
Native PHY IP
Transmit and Receive Data
and Encoding Control Ports
The PR HSSI Interface provides signals for HSSI PHY PCS status and transceiver
loopback control. The signal behavior conforms to the
Stratix® 10 FPGA Transceiver Native PHY IP core in 32-bit PCS-Direct mode.
The below table cross references the HSSI port names to the Native PHY IP port
Figure 8. Connecting the PCS to the HSSI Interface
This figure illustrates how to connect a 10GbE PCS to
the HSSI PHY using the PR HSSI Interface.
4. Native PHY IP Core Parameters
During the FIM instantiation, the
following IP parameters were selected for generating the PHY IP core. These parameter settings
are informatory, you can not control or configure them. For more information about these
parameters, refer to the
Stratix® 10 L- and
H-Tile Transceiver PHY User Guide.
$ echo 31 > tx_vod
bash: echo: write error: Connection timed out
[ 7812.184357] intel-pac-hssi intel-pac-hssi.2.auto: timeout, HSSI ack not received
Check if the channel is held in reset
$ cat stat
Deaasert the reset
$ echo 0x0 > ctrl
$ cat stat0xf3c0f3c0f3c0f3c0
6. Document Revision History for Networking Interface for OPAE
Acceleration Stack Version
Quartus® Prime Pro Edition
the entire document to reflect:
Addition of the PHY PCS-direct mode and
Removal of the 10GbE MAC AFU
Quartus® Prime Pro Edition 18.1.2)