Intel FPGA SDK for OpenCL: Intel Arria 10 GX FPGA Development Kit Reference Platform Porting Guide
Intel FPGA SDK for OpenCL Intel Arria 10 GX FPGA Development Kit Reference Platform Porting Guide
OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission of the Khronos Group™.
The Intel® FPGA SDK for OpenCL™ is based on a published Khronos specification, and has passed the Khronos Conformance Testing Process. Current conformance status can be found at www.khronos.org/conformance.
Intel Arria 10 GX FPGA Development Kit Reference Platform: Prerequisites
Prerequisites for the altera_a10pciedk Reference Platform:
- An
Intel®
Arria® 10-based accelerator card with
working
PCI Express*
(
PCIe*
) and memory interfaces
Test these interfaces together in the same design using the same version of the Intel® Quartus® Prime Pro Edition software that you will use to develop your Custom Platform.
Attention:The native Arria 10 GX FPGA Development Kit does not automatically work with the SDK. Before using the Arria 10 GX FPGA Development Kit with the SDK, you must first contact your field applications engineer or regional support center representative to configure the development kit for you.
Alternatively, contact Intel® Premier Support for assistance.
- Intel® Quartus® Prime Pro Edition software
- Designing with Logic Lock regions
General prerequisites:
- FPGA architecture, including clocking, global routing, and I/Os
- High-speed design
- Timing analysis
- Platform Designer design and Avalon® interfaces
- Tcl scripting
- PCIe
- DDR4 external memory
This document also assumes that you are familiar with the following Intel® FPGA SDK for OpenCL™ -specific tools and documentation:
- Custom Platform Toolkit and the Intel® FPGA SDK for OpenCL™ Custom Platform Toolkit User Guide
-
Stratix® V Network Reference Platform (s5_net)
and the Stratix V Network Reference Platform Porting Guide
The memory-mapped device (MMD) and driver software stack in the a10_ref Reference Platform is derived from the s5_net Reference Platform design.
Features of the Intel Arria 10 GX FPGA Development Kit Reference Platform
The Intel® Arria® 10 GX FPGA Development Kit Reference Platform targets a subset of the hardware features available in the Intel® Arria® 10 GX FPGA Development Kit.

Features of the a10_ref Reference Platform:
- OpenCL Host
The a10_ref Reference Platform uses a PCIe-based host that connects to the Intel® Arria® 10 PCIe Gen3x8 hard IP core.
- OpenCL Global Memory
The hardware provides one 2-gigabyte (GB) DDR4 SDRAM daughtercard that is mounted on the HiLo connector (J14 in Figure 1).
- FPGA Programming via one of the following methods:
- Partial Reconfiguration (PR) over PCIe* .
- External cable and the Intel® Arria® 10 GX FPGA Development Kit's on-board Intel® FPGA Download Cable II interface.
- External Intel FPGA Download Cable II interface connected to a 10-pin JTAG header.
- Guaranteed Timing
The a10_ref Reference Platform relies on the Intel® Quartus® Prime Pro Edition compilation flow to provide guaranteed timing closure. The timing-clean a10_ref Reference Platform is preserved in the form of a precompiled post-fit netlist (that is, the base.qdb Intel® Quartus® Prime Database Export File). The Intel® FPGA SDK for OpenCL™ Offline Compiler imports this preserved post-fit netlist into each OpenCL kernel compilation.
- OpenCL Host Pipe
Using direct memory access (DMA) in Intel® Arria® 10 PCIe* Gen3 x8 hard IP core a10gx_hostch board variant has a direct host to kernel and kernel to host pipe.
Intel Arria 10 GX FPGA Development Kit Reference Platform Board Variants
To compile your OpenCL kernel for a specific board variant, include the -board=<board_name> option in your aoc command (for example, aoc -board=a10gx myKernel.cl).
Contents of the Intel Arria 10 GX FPGA Development Kit Reference Platform
Windows File or Folder | Linux File or Directory | Description |
---|---|---|
board_env.xml | board_env.xml | eXtensible Markup Language (XML) file that describes the Reference Platform to the Intel® FPGA SDK for OpenCL™ . |
hardware | hardware | Contains the
Intel®
Quartus® Prime project templates for the a10gx board variant. See Table 2 for a list of files in this directory. |
windows64 | linux64 | Contains the MMD library, kernel mode driver, and executable files of the SDK utilities (that is, install, uninstall, flash, program, diagnose) for your 64-bit operating system. |
source | source |
For Windows, the source folder contains source codes for the MMD library and SDK utilities. The MMD library and the SDK utilities are in the windows64 folder. For Linux, the source directory contains source codes for the MMD library and SDK utilities. The MMD library and the SDK utilities are in the linux64 directory. |
File | Description |
---|---|
mem.qsys | Platform Designer system that, together with the .ip files in the ip/mem subdirectory, implements the mem component. |
ddr4.qsys | Platform Designer system that, together with the .ip files in the ip/ddr4 subdirectory, implements the ddr4 component. |
base.qsf |
Intel®
Quartus® Prime
Settings File for the base project revision. This file includes, by
reference, all the settings in the flat.qsf file. Use this revision when porting the a10_ref Reference Platform to your own Custom Platform. The Intel® Quartus® Prime Pro Edition software compiles this base project revision from source code. |
base.qar |
Intel®
Quartus® Prime
Archive File that contains base.qdb, pr_base.id, and base.sdc. This file is generated by the scripts/post_flow_pr.tcl file during
base revision compile, and is used during import revision compilation.
|
board.qsys | Platform Designer system that implements the board interfaces (that is, the static region) of the OpenCL hardware system. |
board_spec.xml | XML file that provides the definition of the board hardware interfaces to the SDK. |
device.tcl | Tcl file that is included in all revisions and contains all device-specific information (for example, device family, ordering part number (OPN), and voltage settings). |
flat.qsf |
Intel®
Quartus® Prime Settings File for the
flat project revision. This file includes all the common settings,
such as pin location assignments, that are used in the other
revisions of the project (that is, base, top, and
top_synth). The base.qsf, top.qsf, and top_synth.qsf files include, by reference, all the
settings in the flat.qsf
file. The Intel® Quartus® Prime software compiles the flat revision with minimal location constraints. The flat revision compilation does not generate a base.qar file that you can use for future import compilations and does not implement the guaranteed timing flow. |
import_compile.tcl | Tcl script for the SDK-user compilation flow (that is, import revision compilation). |
max5_150.pof | Programming file for the MAX® V device on the
Intel®
Arria® 10 GX FPGA Development Kit that sets the memory
reference clock to 150 MHz by default at power-up. You must program the max5_150.pof file onto your a10gx board. |
opencl_bsp_ip.qsf |
Intel®
Quartus® Prime
Settings File that collects all the required .ip files in a unique location. During flat and base revision compilations, the board.qsys, mem.qsys and ddr4.qsys Platform Designer files are added to the opencl_bsp_ip.qsf file. |
reset_2x.sdc | Synopsys Design Constraints file with timing constraints for reset synchronizers on the 2x kernel clock. |
quartus.ini | Contains any special Intel® Quartus® Prime software options that you need to compile OpenCL kernels for the a10_ref Reference Platform. |
top.qpf | Intel® Quartus® Prime Project File for the OpenCL hardware system. |
top.qsf | Intel® Quartus® Prime Settings File for the SDK-user compilation flow. |
top.sdc | Synopsys Design Constraints file that contains board-specific timing constraints. |
top.v | Top-level Verilog Design file for the OpenCL hardware system. |
top_post.sdc | Platform Designer and Intel® FPGA SDK for OpenCL™ IP-specific timing constraints. |
top_synth.qsf | Intel® Quartus® Prime Settings File for the Intel® Quartus® Prime revision in which the OpenCL kernel system is synthesized. |
ip/mem/<file_name> | Directory containing the .ip files that the
Intel®
Quartus® Prime Pro Edition software needs to
parameterize the mem component. You must provide both the mem.qsys file and the corresponding .ip files in this directory to the Intel® Quartus® Prime Pro Edition software. |
ip/ddr4/<file_name> | Directory containing the .ip files that the
Intel®
Quartus® Prime Pro Edition software needs to
parameterize the ddr4 component. You must provide both the ddr4.qsys file and the corresponding .ip files in this directory to the Intel® Quartus® Prime Pro Edition software. |
ip/board/<file_name> | Directory containing the .ip files that the
Intel®
Quartus® Prime Pro Edition software needs to
parameterize the board instance. You must provide both the board.qsys file and the corresponding .ip files in this directory to the Intel® Quartus® Prime Pro Edition software. |
ip/freeze_wrapper.v | Verilog Design File that implements the freeze logic placed at outputs of the Partial Reconfiguration region. |
ip/irq_controller/<file_name> | IP that receives interrupts from the OpenCL
kernel system and sends message signaled interrupts (MSI) to the
host. Refer to the Message Signaled Interrupts section for more information. |
ip/host_channel | IP that implements the DMA descriptor
controller as well as AVMM-to-AVST and AVST-to-AVMM between DMA and
kernel. Attention: This IP is available only in
the a10gx_hostch board variant.
|
scripts/base_write_sdc.tcl | Tcl script that the base revision compilation uses to generate the base.sdc file containing all the constraints collected in the base revision compilation. The Intel® Quartus® Prime Pro Edition software uses the base.sdc file when compiling the import (top) revision. |
scripts/create_fpga_bin_pr.tcl | Tcl script that generates the fpga.bin file. The fpga.bin file contains all the
necessary files for configuring the FPGA. For more information on the fpga.bin file, refer to the Define the Contents of the fpga.bin File for the Intel® Arria® 10 GX FPGA Development Kit Reference Platform section. |
scripts/post_flow_pr.tcl | Tcl script that implements the guaranteed timing closure flow, as described in the Guaranteed Timing Closure of the Intel® Arria® 10 GX FPGA Development Kit Reference Platform Design section. |
scripts/pre_flow_pr.tcl | Tcl script that executes before the invocation of the Intel® Quartus® Prime software compilation. Running the script generates the Platform Designer HDL for board.qsys and kernel_system.qsys. It also creates a unique ID for the PR base revision (that is, static region). This unique ID is stored in the pr_base.id file. |
scripts/regenerate_cache.tcl | Tcl script that regenerates the BAK cache file in your temporary directory. |
scripts/qar_ip_files.tcl | Tcl script that packages up base.qdb, pr_base.id and base.sdc during base revision compile. |
scripts/create_acds_ver_hex.tcl | Tcl script called by the pre_flow_pr.tcl script to create contents of the ACDS version ROM. |
Intel Arria 10 GX FPGA Development Kit Reference Platform BSP Changes Between Intel Quartus Prime Design Suite Releases
- BSP Changes from Intel Quartus Prime Design Suite Version 16.1 to Version 17.0
- BSP Changes from Intel Quartus Prime Design Suite Version 17.0 to Version 17.1
- BSP Changes from Intel Quartus Prime Design Suite Version 17.1 to Version 18.0
- BSP Changes from Intel Quartus Prime Design Suite Version 18.0 to Version 18.1
BSP Changes from Intel Quartus Prime Design Suite Version 16.1 to Version 17.0
If you use or have modified an Intel® Arria® 10 GX FPGA Development Kit Reference Platform BSP provided for Intel® Quartus® Prime Design Suite Version 16.1, review the following information to learn about changes implemented in the BSP for Version 17.0.
- Mandatory flow and assignment changes required by
Intel®
Quartus® Prime Version 17.0.
The changes include changed fitter requirements and DSPBA changes.
- Added features including JTAG cable detection and support for BSP flow variants.
- Cosmetic changes including packaging base revision compilation outputs, floorplanning changes, and cleaning up scripts.
File | Change |
---|---|
|
These files are now contained in base.qar. |
base.qsf | Contains changes to regions due to new floorplanning. |
board.qsys | Added JTAG cable detection IP. |
flat.qsf | Removed the following items:
|
import_compile.tcl | Removed BAK flow specific calls to the bak_flow.tcl script. |
max5_150.pof | Removed. This object file is now handled in the bringup folder in the BSP. |
create_fpga_bin_pr.tcl | Changes for the BSP flow arguments. |
post_flow_pr.tcl | Changes for packaging the IP files in the base.qar file. |
pre_flow_pr.tcl | Changes for unpackaging the base.qar file and the BSP flow argument. |
quartus.ini | Added new required INI (qhd_error_on_missing_TCL_ENTITY_FILE=off) |
|
New scripts added to the BSP. |
top_post.sdc | Updated the false path on the kernel 2x clock consumer. |
BSP Changes from Intel Quartus Prime Design Suite Version 17.0 to Version 17.1
If you use or have modified an Intel® Arria® 10 GX FPGA Development Kit Reference Platform BSP provided for Intel® Quartus® Prime Design Suite Version 17.0, review the following information to learn about changes implemented in the BSP for Version 17.1.
The files in the BSP have the following changes from Intel® Quartus® Prime Design Suite Version 17.0 to Version 17.1:
File | Change |
---|---|
acl_ddr4_a10_core.qsys | Renamed as ddr4.qsys to reduce long path issues in Windows. |
All .ip files in the ip/acl_ddr4_a10_core/directory | Renamed as ip/ddr4/ to reduce long path issues in Windows. |
acl_ddr4_a10.qsys | Renamed as mem.qsys to reduce long path issues in Windows. |
All .ip files in the ip/acl_ddr4_a10/directory | Renamed as ip/mem/ to reduce long path issues in Windows. |
board.qsys |
|
base.qsf | Changed the hierarchy for Logic Lock regions due to the renaming of .qsys files. |
flat.qsf |
|
top.qsf | Added GENERATE_PR_RBF_FILE ON and QDB_FILE_PARTITION assignments. |
top_post.sdc | Changed the hierarchy of asynchronous clock groups and false path due to the renaming of.qsys files. |
import_compiles.tcl |
|
board_spec.xml | Updated version from 17.0 to 17.1 |
quartus.ini |
|
base.qar | Updated the file with ACDS 17.1 static region. |
scripts/pre_flow_pr.tcl |
|
scripts/post_flow_pr.tcl |
|
scripts/create_fpga_bin_pr.tcl |
|
scripts/qar_ip_files.tcl |
|
scripts/regenerate_cache.tcl | Changes needed for moving bak_flow.tcl into Intel® FPGA SDK for OpenCL™ |
scripts/bak_flow.tcl | Moved the script into Intel® FPGA SDK for OpenCL™ . |
scripts/helpers.tcl | Moved the script into Intel® FPGA SDK for OpenCL™ . |
scripts/create_acds_ver_hex.tcl | Added the script to create the contents of the ACDS version ROM. |
ip/host_channel | Added the IP for a10gx_hostch board variant. |
BSP Changes from Intel Quartus Prime Design Suite Version 17.1 to Version 18.0
The files in the BSP have the following changes from Intel® Quartus® Prime Design Suite Version 17.1 to Version 18.0:
File | Change |
---|---|
|
New files added to support the change of kernel_system_inst to pr_region_inst (Platform Designer-less flow that removes the need of Platform Designer Generate for the kernel region). |
|
kernel_system_inst changed to pr_region_inst (Platform Designer-less flow that removes the need of Platform Designer Generate for the kernel region). |
|
Changes required to add the fast compile mode. |
flat.qsf | Removed OUTPUT_IO_TIMING QSF assignments. |
|
Switched to a simplified PR flow. |
BSP Changes from Intel Quartus Prime Design Suite Version 18.0 to Version 18.1
The files in the BSP have the following changes from Intel® Quartus® Prime Design Suite Version 18.0 to Version 18.1:
File | Change |
---|---|
base.qsf | Removed the Logic Lock region assignments. |
board_spec.xml | Changed OPN to production silicon. |
device.tcl | Decreased the core voltage from 950mV to 900mV. |
top_post.sdc | Added a clock uncertainty between 1x and 2x clocks to avoid hold time violations. |
scripts/helpers.tcl | Added a new script. |
scripts/pre_flow_pr.tcl | Added the incremental compile support. |
BSP Changes from Intel Quartus Prime Design Suite Version 18.1 to Version 19.1
In Intel® Quartus® Prime Design Suite Version 19.1, the multicycle constraint on kernel reset was removed. This change prevents IPs from coming out of reset on different cycles. Though generated OpenCL kernels should not have state transitions when coming out of reset, this change guarantees functionality for any IP on the kernel clock domain.
Files located in the INTELFPGAOCLSDKROOT/board/a10_ref/ directory in the BSP have the following changes from Intel® Quartus® Prime Design Suite Version 18.1 to Version 19.1:
File | Change |
---|---|
hardware/a10gx_hostch/* | Directory renamed as a10gx_hostpipe. |
hardware/a10gx_hostpipe/board_spec.xml | Changed the board name to a10gx_hostpipe. |
hardware/a10gx_hostpipe/base.qsf | Cleaned up the floorplan to be similar to a10gx board variant. |
hardware/a10gx_hostpipe/ip/host_channel/acl_hostchannel_top_hw.tcl | Changed the readyLatency value from 2 to 3. |
|
Removed multicycle reset constraint. For more information, refer to Removing the Multicycle Reset Constraint. |
reset_2x.sdc | Added this file to both board variants in a10_ref BSP. |
- Removing the Multicycle Reset Constraint
- Creating Copies of the Kernel Clock
- Instantiating the Reset Synchronizer
- Modifying the Floorplan
- Completing the Changes
Removing the Multicycle Reset Constraint
Follow these steps to modify the timing constraint file top_post.sdc (located in the hardware directory of your BSP) to remove the multicycle reset constraint:
- Remove four lines after the comment # Make the kernel reset multicycle (originally at line 91 in the top_post.sdc file).
- Add the lines following the comment # cut path from reset to synchronizer (now at line 90) and # over-constrain the delay from the synchronizer (at line 108) of the a10_ref/hardware/a10gx/top_post.sdc file in the 19.1 a10_ref BSP to the top_post.sdc file of your BSP.
# cut path from reset to synchronizer set_false_path -from {freeze_wrapper_inst|kernel_system_clock_reset_reset_reset_n} \ -to {freeze_wrapper_inst|acl_reset_handler_inst|*}
# overconstrain the delay from the synchronizer to the global buffer to # help recovery timing set_max_delay 1 -from \ freeze_wrapper_inst|acl_reset_handler_inst|GEN_SYNCHRONIZER.synchronizer_body[1] \ -to freeze_wrapper_inst|acl_reset_handler_inst|GEN_SYNCHRONIZER.synchronizer_body[1] \ ~CLKENA0|outclk
Creating Copies of the Kernel Clock
To improve kernel performance without the multicycle reset constraint, synchronize the reset going into the kernel by using a copy of the kernel clock distributed on a periphery clock network, which has a lower delay. Follow these steps to modify the board.qsys file located in the hardware directory of your BSP to create copies of the kernel clock on separate global and periphery networks:
- Replace hw_iface.iipx and sw_iface.iipx files with the ones from the 19.1 BSP folder.
- Open the board.qsys file in Platform Designer.
- For the acl_kernel_interface IP, change the Make kernel reset a global signal parameter value from 1 to 0.
- Instantiate an A10 Global Clock Enable component and an A10 Periph Clock Enable component (both components are in the ACL Internal Components category).
- Connect the 1x clock of the kernel clock generator to the input of both components.
- Export the output of the A10 Periph Clock Enable as reset_sync_clk.
- Connect all components in the board.qsys file (which are connected directly to the 1x clock of the kernel clock generator) to the clock from the output of the A10 Global Clock Enable.
- Regenerate the system, which updates .ip files in the ip/board/ directory.
Instantiating the Reset Synchronizer
Follow these steps to instantiate the reset synchronizer in the ip/freeze_wrapper.v file located in the hardware directory of your BSP:
- Add a new wire to the top.v file to connect the new output of board_inst to the new input port board_reset_sync_clk of the freeze_wrapper.v file.
- Add the new input port to the ip/freeze_wrapper.v file.
- Add a new signal for the synchronized reset.
- Instantiate the acl_reset_handler as indicated between lines 117 and 130 in the a10_ref/hardware/a10gx/ip/freeze_wrapper.v file in the 19.1 a10_ref BSP.
- Connect the new synchronized reset to the port clock_reset_reset_reset_n of pr_region_inst.
Modifying the Floorplan
You must update the floorplan in the base revision to allow the reset synchronizer to be placed as close as possible to the PLL. If the reference clock input pin is the same as in the 19.1 a10_ref BSP, the same regions and constraints can be used.
Follow these steps to modify the floorplan:
- Create a 1x1 region adjacent to the PLL that can be added to the static region. This can be performed by first creating a 1x1 floorplan region, and then subtracting it from the Partial Reconfiguration (PR) region.
- To improve timing closure of the base compile, provide a location constraint to global clock buffer, periphery clock buffer, and reset synchronizer as close as possible to the PLL, as indicated between lines 54 to 56 of the a10_ref/hardware/a10gx/ip/base.qsf file in the 19.1 a10_ref BSP:
# location constraint these global buffers and synchronization register to # help recovery timing. set_location_assignment CLKCTRL_3A_G_I21 \ -to freeze_wrapper_inst|acl_reset_handler_inst|GEN_SYNCHRONIZER.synchronizer_body[1]~CLKENA0 set_location_assignment FF_X143_Y17_N26 \ -to freeze_wrapper_inst|acl_reset_handler_inst|GEN_SYNCHRONIZER.synchronizer_body[1] set_location_assignment CLKCTRL_3A_P15_I6 \ -to board_inst|kclk_periph|periph_clkena_0|clkena
Completing the Changes
To complete the changes, perform the following steps in the hardware directory of your BSP:
- In the scripts/post_flow_pr.tcl
file, add the following line before the line source
"$sdk_root/ip/board/bsp/adjust_plls.tcl” (as in line 64 of the 19.1
a10_ref
BSP):
set recovery_multicycle 1.0
- In the flat.qsf file, perform the following changes:
- Remove the following line that was originally at line
61:
set_instance_assignment -name GLOBAL_SIGNAL GLOBAL_CLOCK \ -to board_inst|kernel_clk_gen|kernel_clk_gen|kernel_pll|altera_pll_i \ |twentynm_pll|outclk[0]
- Replace two references to freeze_wrapper_inst|kernel_system_clock_reset_reset_reset_n with freeze_wrapper_inst|acl_reset_handler_inst|GEN_SYNCHRONIZER.synchronizer_body[1].
- Remove the following line that was originally at line
61:
- Copy the reset_2x.sdc file from 19.1
a10_ref BSP to the hardware directory of your BSP
and add the following assignment after the assignment that adds
top_post.sdc to your project in flat.qsf.
# SDC file needed for 2x clock reset synchronizaion set_global_assignment -name SDC_FILE reset_2x.sdc
Now, you can perform the base compile to generate a new base.qar.
Developing Your Intel Arria 10 Custom Platform
Developing your Custom Platform requires in-depth knowledge of the contents in the following documents and tools:
- Intel® FPGA SDK for OpenCL™ Custom Platform User Guide
- Contents of the SDK Custom Platform Toolkit
- Stratix V Network Reference Platform Porting Guide
- Documentation for all the Intel® FPGA IP in your Custom Platform
- Intel® FPGA SDK for OpenCL™ Getting Started Guide
- Intel® FPGA SDK for OpenCL™ Programming Guide
In addition, you must independently verify all IP on your computing card (for example, PCIe® controllers and DDR4 external memory).
Initializing Your Intel Arria 10 Custom Platform
- Copy the INTELFPGAOCLSDKROOT/board/a10_ref directory, where INTELFPGAOCLSDKROOT is the location of the SDK installation.
- Paste the a10_ref directory into a directory that you own (that is, not a system directory) and then rename it ( <your_custom_platform> ).
- Choose the a10gx board variant in the <your_custom_platform>/hardware directory to match the production silicon for the Intel® Arria® 10 FPGA as the basis of your design.
- Rename a10gx board variant to match the name of your FPGA board ( <your_custom_platform>/hardware/<board_name> ).
- Modify the <your_custom_platform>/board_env.xml file so that the name and default fields match the changes you made in step 2 and step 4, respectively.
-
Modify the my_board name in
the inside
<your_custom_platform>/hardware/<board_name>/board_spec.xml
file to match the change you made in step 2.
> aoc -list-boards Board list: my_board
- In the SDK, invoke the command aoc -list-boards to confirm that the Intel® FPGA SDK for OpenCL™ Offline Compiler displays the board name in your Custom Platform.
Modifying the Intel Arria 10 GX FPGA Development Kit Reference Platform Design
You can add a component in Platform Designer and connect it to the existing system, or add a Verilog file to the available system. After adding the custom components, connect those components in Platform Designer.
- Instantiate your PCIe controller, as described in Host-to- Intel® Arria® 10 Communication over PCIe section.
-
Instantiate any memory controllers and I/O channels. You can
add the board interface hardware either as Platform Designer
components in the board.qsys
Platform Designer system or as HDL in the top.v file.
The board.qsys file and the top.v file are in the <your_custom_platform>/hardware/<board_name> directory.
- Modify the device.tcl file to match all the correct settings for the device on your board.
-
Modify the <your_custom_platform>/hardware/<board_name>/flat.qsf file
to use only the pin-outs and settings for your system. The base.qsf, top.qsf, and top_synth.qsf
files will include all the settings from the flat.qsf file.
The top.qsf and top_synth.qsf files are in the <your_custom_platform>/hardware/<board_name> directory.
Integrating Your Intel Arria 10 Custom Platform with the Intel FPGA SDK for OpenCL
- Update the <your_custom_platform>/hardware/<board_name>/board_spec.xml file. Ensure that there is at least one global memory interface, and all the global memory interfaces correspond to the exported interfaces from the board.qsys Platform Designer System File.
-
Set the environment variable ACL_DEFAULT_FLOW to flat.
Setting this environment variable instructs the SDK to compile the flat revision corresponding to <your_custom_platform>/hardware/<board_name>/flat.qsf file without the partitions or Logic Locks.Tip: Intel recommends that you get a timing clean flat revision compiled before proceeding to the base revision compiles. You can also invoke the following command with the -bsp-flow=<revision_type> attribute to run different revisions of your project (for example, flat or base compiles).
aoc -bsp-flow=flat boardtest.cl -o=bin/boardtest.aocx
-
Set the environment variable ACL_DEFAULT_FLOW to base.
Setting this environment variable instructs the SDK to compile the base revision corresponding to the <your_custom_platform>/hardware/<board_name>/base.qsf file.
-
Perform the steps outlined in the
INTELFPGAOCLSDKROOT/board/custom_platform_toolkit/tests/README.txt
file to compile the
INTELFPGAOCLSDKROOT/board/custom_platform_toolkit/tests/boardtest/boardtest.cl
OpenCL kernel source file.
Note: The environment variable INTELFPGAOCLSDKROOT points to the location of the SDK installation.
-
If compilation fails because of timing failures, fix the errors, or
compile
INTELFPGAOCLSDKROOT/board/custom_platform_toolkit/tests/boardtest.cl
with different seeds. To compile the kernel with a different seed, include the -seed=<N> option in
the aoc command (for example,
aoc
-seed=2 boardtest.cl).
You might be able to fix minor timing issues by simply compiling your kernel with a different seed.
Setting up the Intel Arria 10 Custom Platform Software Development Environment
-
To compile the MMD layer for Windows, perform the following tasks:
- Install the GNU make utility on your development machine.
- Install a version of Microsoft Visual Studio that has the ability to compile 64-bit software (for example, Microsoft Visual Studio version 2010 Professional).
- Set the development environment so that SDK users can invoke commands and utilities at the command prompt.
- Modify the <your_custom_platform_name>/source/Makefile.common file so that TOP_DEST_DIR points to the top-level directory of your Custom Platform.
- In the Makefile.common file or the development environment, set the JUNGO_LICENSE variable to your Jungo WinDriver license.
- To check that you have set up the software development environment properly, invoke the gmake or gmake clean command.
-
To compile the MMD layer for Linux, perform the following tasks:
- Ensure that you use a Linux distribution that Intel® supports (for example, GNU Compiler Collection (GCC) version 4.47).
- Modify the <your_custom_platform>/source/Makefile.common file so that TOP_DEST_DIR points to the top-level directory of your Custom Platform.
- To check that you have set up the software environment properly, invoke the make or make clean command.
Establishing Intel Arria 10 Custom Platform Host Communication
-
Program your FPGA device with the
<your_custom_platform>/hardware/<board_name>/base.sof file
and then reboot your system.
You should have created the base.sof file when integrating your Custom Platform with the Intel® FPGA SDK for OpenCL™ . Refer to the Integrating Your Intel® Arria® 10 Custom Platform with the Intel® FPGA SDK for OpenCL™ section for more information.
-
Confirm that your operating system recognizes a
PCIe*
device with your vendor and device
IDs.
- For Windows, open the Device Manager and verify that the correct device and IDs appear in the listed information.
- For Linux, invoke the lspci command and verify that the correct device and IDs appear in the listed information.
- Run the aocl install <path_to_customplatform> utility command to install the kernel driver on your machine.
-
For Windows, set the PATH environment variable. For
Linux, set the LD_LIBRARY_PATH environment
variable.
For more information about the settings for PATH and LD_LIBRARY_PATH, refer to Setting the Intel® FPGA SDK for OpenCL™ User Environment Variables in the Intel® FPGA SDK for OpenCL™ Getting Started Guide.
- Modify the version_id_test function in your <your_custom_platform>/source/host/mmd/acl_pcie_device.cpp MMD source code file to exit after reading from the version ID register.
- Run the aocl diagnose utility command and confirm that the version ID register reads back the ID successfully. You may set the environment variables ACL_HAL_DEBUG and ACL_PCIE_DEBUG to a value of 1 to visualize the result of the diagnostic test on your terminal.
Branding Your Intel Arria 10 Custom Platform
- In the software development environment available with the a10_ref Reference Platform, replace all references of "a10_ref" with the name of your Custom Platform.
- Modify the PACKAGE_NAME and MMD_LIB_NAME fields in the <your_custom_platform>/source/Makefile.common file.
- Modify the name, linklib, and mmlibs elements in <your_custom_platform>/board_env.xml file to your custom MMD library name.
-
In your Custom Platform, modify the following lines of code in
the hw_pcie_constants.h file to include
information of your Custom Platform:
#define ACL_BOARD_PKG_NAME "a10_ref" #define ACL_VENDOR_NAME "Intel Corporation" #define ACL_BOARD_NAME "Arria 10 Reference Platform"
For Windows, the hw_pcie_constants.h file is in the <your_custom_platform>\source_windows64\include folder. For Linux, the hw_pcie_constants.h file is in the <your_custom_platform>/linux64/driver directory.
Note: The ACL_BOARD_PKG_NAME variable setting must match the name attribute of the board_env element that you specified in the board_env.xml file. -
Define the Device ID, Subsystem Vendor ID, Subsystem Device
ID, and Revision ID, as defined in the Device
Identification Registers for
Intel®
Arria® 10
PCIe Hard IP
section.
Note: The PCIe* IDs in the hw_pcie_constants.h file must match the parameters in the PCIe® controller hardware.
- Update your Custom Platform's board.qsys Platform Designer system and the hw_pcie_constants.h file with the IDs defined in step 5.
-
For Windows, update the DeviceList fields in the
<your_custom_platform>\windows64\driver\acl_boards_a10_ref.inf
file to match your PCIe ID values and then rename the file to acl_board_<your_custom_platform>.inf.
Note: The <your_custom_platform> string in acl_board_<your_custom_platform>.inf must match the string you specify for the name field in the board_env.xml file.
- Run make in the <your_custom_platform>/source directory to generate the driver.
Changing the Device Part Number
Update the device part number in the following files within the <your_custom_platform>/hardware/<board_name> directory:
-
In the device.tcl file,
change the device part number in the set global
assignment -name DEVICE 10AX115S2F45I1SG QSF assignment.
The updated device number will appear in the base.qsf, top.qsf, and top_synth.qsf files.
- In the board.qsys, mem.qsys, and ddr4.qsys files, change all occurrences of 10AX115S2F45I1SG.
Connecting the Memory in the Intel Arria 10 Custom Platform
- In your Custom Platform, instantiate your external memory IP based on the information in the DDR4 as Global Memory for OpenCL Applications section. Update the information pertaining to the global_mem element in the <your_custom_platform>/hardware/<board_name>/board_spec.xml file.
- Remove the boardtest hardware configuration file that you created during the integration of your Custom Platform with the Intel® FPGA SDK for OpenCL™ .
-
Recompile the
INTELFPGAOCLSDKROOT/board/custom_platform_toolkit/tests/boardtest/boardtest.cl
kernel source file.
The environment variable INTELFPGAOCLSDKROOT points to the location of the SDK installation.
- Reprogram the FPGA with the new boardtest hardware configuration file and then reboot your machine.
-
Modify the wait_for_uniphy function in the
acl_pcie_device.cpp MMD source code
file to exit after checking the UniPHY status register. Rebuild the MMD
software.
For Windows/Linux, the acl_pcie_device.cpp file is in the <your_custom_platform>\source\host\mmd folder.
-
Run the
aocl
diagnose
SDK utility and confirm that the host
reads back both the version ID and the value 0 from the uniphy_status component.
The utility should return the message Uniphy are calibrated.
-
Consider analyzing your design in the Signal Tap logic analyzer to confirm the successful calibration of
all memory controllers.
Note: For more information on Signal Tap logic analyzer, download the Signal Tap II Logic Analyzer tutorial from the University Program Tutorial page.
Modifying the Kernel PLL Reference Clock
- In the <your_custom_platform>/hardware/<board_name>/board.qsys file, update the REF_CLK_RATE parameter value on the kernel_clk_gen IP module.
- In the <your_custom_platform>/hardware/<board_name>/top.sdc file, update the create_clock assignment for kernel_pll_refclk.
- Optional: In the <your_custom_platform>/hardware/<board_name>/top.v file, update the comment for the kernel_pll_refclk input port.
Integrating an OpenCL Kernel in Your Intel Arria 10 Custom Platform
-
Perform the steps outlined in
INTELFPGAOCLSDKROOT/board/custom_platform_toolkit/tests/README.txt
file to build the hardware configuration file from the
INTELFPGAOCLSDKROOT/board/custom_platform_toolkit/tests/boardtest/boardtest.cl
kernel source file.
The environment variable INTELFPGAOCLSDKROOT points to the location of the Intel® FPGA SDK for OpenCL™ installation.
- Program your FPGA device with the hardware configuration file you created in step 1 and then reboot your machine.
-
Remove the early-exit modification in the version_id_test function in the acl_pcie_device.cpp file that you implemented when you
established communication between the board and the host interface.
For Windows/Linux, the acl_pcie_device.cpp file is in the <your_custom_platform>\source\host\mmd folder.
-
Invoke the
aocl
diagnose
<device_name>
command,
where <device_name> is the string you
define in your Custom Platform to identify each board.
By default, <device_name> is the acl number (for example, acl0 to acl31) that corresponds to your FPGA device. In this case, invoke the aocl diagnose acl0 command.
-
Build the boardtest host
application using the .sln file (Windows)
or Makefile (Linux) in the SDK's Custom Platform
Toolkit.
For Windows, the .sln file for Windows is in the INTELFPGAOCLSDKROOT\board\custom_platform_toolkit\tests\boardtest\host folder. For Linux, the Makefile is in the INTELFPGAOCLSDKROOT/board/custom_platform_toolkit/tests/boardtest directory.
-
Set the environment variable CL_CONTEXT_COMPILER_MODE_INTELFPGA to a value of 3 and run the boardtest host application.
For more information about CL_CONTEXT_COMPILER_MODE_INTELFPGA, refer to Troubleshooting Intel® Arria® 10 GX FPGA Development Kit Reference Platform Porting Issues.
Guaranteeing Timing Closure in the Intel Arria 10 Custom Platform
-
Establish the floorplan of your design.
Important: Consider all design criteria outlined in the FPGA System Design section of the Intel® FPGA SDK for OpenCL™ Custom Platform Toolkit User Guide.
-
Compile several seeds of the
INTELFPGAOCLSDKROOT/board/custom_platform_toolkit/tests/boardtest/boardtest.cl
file until you generate a design that closes timing cleanly.
To specify the seed number, include the -seed=<N> option in your aoc command.
- Copy the base.qar file from the INTELFPGAOCLSDKROOT/board/a10_ref/hardware/a10gx directory into your Custom Platform.
-
Use the flat.qsf file in
the a10_ref Reference Platform as references to determine the type of
information you must include in the flat.qsf file for your Custom Platform.
The base.qsf, top.qsf, and top_synth.qsf files automatically inherit all settings in the flat.qsf file. However, if you need to modify Logic Lock Plus region or PR assignments, only make these changes in the base.qsf file.
- Confirm that you can use the .aocx file to reprogram the FPGA by invoking the aocl program acl0 boardtest.aocx command.
- Remove the ACL_DEFAULT_FLOW environment variable that you added when integrating your Custom Platform with the Intel® FPGA SDK for OpenCL™ .
- Ensure that the environment variable CL_CONTEXT_COMPILER_MODE_INTELFPGA is not set.
- Run the boardtest_host executable.
Generating the base.qar Post-Fit Netlist for Your Intel Arria 10 Custom Platform
- Port the system design and the flat.qsf file to your computing card.
-
Compile the
INTELFPGAOCLSDKROOT/board/custom_platform_toolkit/tests/boardtest/boardtest.cl
kernel source file using the base revision. Fix any timing failures and
recompile the kernel until timing is clean. You can add the -bsp-flow=base argument to the aoc command to generate the base.qar file during the kernel
compilation.
INTELFPGAOCLSDKROOT points to the location of the Intel® FPGA SDK for OpenCL™ installation.
- Copy the generated base.qar file into your Custom Platform.
-
Using the default compilation flow, test the base.qar file across several OpenCL™ design examples and confirm that the following
criteria are satisfied:
- All compilations close timing.
- The OpenCL design examples achieve satisfactory Fmax.
- The OpenCL design examples function on the accelerator board.
Troubleshooting Intel Arria 10 GX FPGA Development Kit Reference Platform Porting Issues
Environment Variable | Description |
---|---|
ACL_HAL_DEBUG | Set this variable to a value of 1 to 5 to enable increasing debug output from the Hardware Abstraction Layer (HAL), which interfaces directly with the MMD layer. |
ACL_PCIE_DEBUG | Set this variable to a value of 1 to 10000 to enable increasing debug output from the MMD. This variable setting is useful for confirming that the version ID register was read correctly and the UniPHY IP cores are calibrated. |
ACL_PCIE_JTAG_CABLE | Set this variable to override the default quartus_pgm argument that specifies the cable number. The default is cable 1. If there are multiple Intel® FPGA Download Cable, you can specify a particular one here. |
ACL_PCIE_JTAG_DEVICE_INDEX | Set this variable to override the default quartus_pgm argument that specifies the FPGA device index. By default, this variable has a value of 1. If the FPGA is not the first device in the JTAG chain, you can customize the value. |
ACL_PCIE_USE_JTAG_PROGRAMMING | Set this variable to force the MMD to reprogram the FPGA using the JTAG cable instead of Partial Reconfiguration. |
ACL_PCIE_DMA_USE_MSI | Set this variable if you want to use MSI for DMA transfers on Windows. |
CL_CONTEXT_COMPILER_MODE_INTELFPGA | Unset this variable or set it to a value of 3. The OpenCL host runtime reprograms the FPGA as needed, which it does at least once during initialization. To prevent the host application from programming the FPGA, set this variable to a value of 3. |
Intel Arria 10 GX FPGA Development Kit Reference Platform Design Architecture
- Host-to- Intel Arria 10 FPGA Communication over PCIe
The Intel® Arria® 10 GX FPGA Development Kit Reference Platform instantiates the Intel® Arria® 10 PCIe* hard IP to implement a host-to-device connection over PCIe* . - DDR4 as Global Memory for OpenCL Applications
The Intel® Arria® 10 GX FPGA Development Kit has one bank of 2GB x72 DDR4-2400 SDRAM. - Host Connection to OpenCL Kernels
The PCIe® host needs to pass commands and arguments to the OpenCL™ kernels via the control register access (CRA) Avalon® slave port that each OpenCL kernel generates. - Intel Arria 10 FPGA System Design
To integrate all components, close timing, and deliver a post-fit netlist that functions in the hardware, you must first address several additional FPGA design complexities. - Dynamic PLL Reconfiguration
The Intel® FPGA SDK for OpenCL™ relies on the post_flow_pr.tcl Tcl script and the instantiation of the acl_kernel_clk_a10 Platform Designer component to modify kernel PLL. - Guaranteed Timing Closure of the Intel Arria 10 GX FPGA Development Kit Reference Platform Design
One of the key features of the Intel® FPGA SDK for OpenCL™ is that it abstracts away hardware details, such as timing closure, for software developers. - Intel Quartus Prime Compilation Flow and Scripts
The import_compile.tcl Tcl Script File in the Intel® Arria® 10 GX FPGA Development Kit Reference Platform controls the Intel® Quartus® Prime compilation flow. - Addition of Timing Constraints
In the Intel® Arria® 10 FPGA Development Kit Reference Platform, the top.sdc file contains all timing constraints applicable before IP instantiation in Platform Designer. The top_post.sdc file contains timing constraints applicable after Platform Designer generation is run. - Connection of the Intel Arria 10 GX FPGA Development Kit Reference Platform to the Intel FPGA SDK for OpenCL
A Custom Platform must include a board_env.xml file to describe its general contents to the Intel® FPGA SDK for OpenCL™ Offline Compiler. - Intel Arria 10 FPGA Programming Flow
There are three ways to program the Intel® Arria® 10 FPGA for the Intel® Arria® 10 GX FPGA Development Kit Reference Platform: Flash, quartus_pgm, and partial reconfiguration (PR). - Host-to-Device MMD Software Implementation
The Intel® Arria® 10 GX FPGA Development Kit Reference Platform's MMD layer is a thin software layer that is essential for communication between the host and the board. - Implementation of Intel FPGA SDK for OpenCL Utilities
The Intel® Arria® 10 GX FPGA Development Kit Reference Platform includes a set of Intel® FPGA SDK for OpenCL™ utilities for managing the FPGA board. - Intel Arria 10 FPGA Development Kit Reference Platform Scripts
The Intel® Arria® 10 FPGA Development Kit Reference Platform includes a number of Tcl scripts in its hardware/<board_name>/scripts directory. - Considerations in Intel Arria 10 GX FPGA Development Kit Reference Platform Implementation
The implementation of the Intel® Arria® 10 GX FPGA Development Kit Reference Platform includes some workarounds that address certain Intel® Quartus® Prime Pro Edition software known issues.
Host-to- Intel Arria 10 FPGA Communication over PCIe
Instantiation of Intel Arria 10 PCIe Hard IP with Direct Memory Access
Dependencies
- Intel® Arria® 10 PCIe* hard IP core
- Parameter Settings section of the Intel® Arria® 10 Avalon® -MM DMA Interface for PCIe Solutions User Guide
Parameter(s) | Setting |
---|---|
System Settings | |
Application interface type |
Avalon®
-MM with DMA
This Avalon® Memory-Mapped ( Avalon® -MM) interface instantiates the embedded DMA of the PCIe® hard IP core. |
Hard IP mode |
Gen3x8, Interface: 256-bit, 250 MHz Number of Lanes: x8 Lane Rate: Gen3 (8.0 Gbps) Note: This setting is the fastest
configuration that the
Avalon®
-MM DMA slave interface currently supports.
|
Rx Buffer credit allocation |
Low
Note: This setting is
derived experimentally.
|
Intel® Arria® 10 Avalon® -MM Settings | |
Export MSI/MSI-X conduit interfaces |
Enabled
Export the MSI interface in order to connect the interrupt sent from the kernel interface to the MSI. |
Instantiate Internal Descriptor Controller |
Enabled
Instantiates the descriptor controller in the Avalon® -MM DMA bridge. Use the 128-entry descriptor controller that the PCIe* hard IP core provides. Disabled for a10gx_hostch board variant. The descriptor controller is implemented in the ip/host_channel subdirectory. |
Address width of accessible PCIe memory space |
64 bits
This value is machine dependent. To avoid truncation of the MSI memory address, 64-bit machines should allot 64 bits to access the PCIe* address space. |
Base Address Register (BAR) Settings | |
Base Address Registers (BARs) | This design uses two BARs. For BAR 0, set Type to 64-bit prefetchable memory. The Size parameter setting is disabled because the Instantiate Internal Descriptor Controller parameter is enabled in the Avalon® -MM system settings. BAR 0 is only used to access the DMA Descriptor Controller, as described in the Intel® Arria® 10 Avalon® -MM DMA for PCI Express section of the Intel® Arria® 10 Avalon® -MM DMA Interface for PCIe* Solutions User Guide. For Bar 4, set Type to 64-bit prefetchable memory, and set Size to 256 KBytes - 18 bits. BAR 4 is used to connect PCIe to the OpenCL kernel systems and other board modules. |
Device Identification Registers for Intel Arria 10 PCIe Hard IP
ID Register Name | ID Provider | Description | Parameter Name in PCIe IP Core |
---|---|---|---|
Vendor ID | PCI-SIG® | Identifies the FPGA manufacturer.
Always set this register to 0x1172, which is the Intel® vendor ID. |
vendor_id_hwtcl |
Device ID | Intel® | Describes the PCIe configuration
on the FPGA according to
Intel®
's internal guideline. Set the device ID to the device code of the FPGA on your accelerator board. For the Intel® Arria® 10 GX FPGA Development Kit Reference Platform, set the Device ID register to 0x2494, which signifies Gen 3 speed, 8 lanes, Intel® Arria® 10 device family, and Avalon® -MM interface, respectively. Refer to Table 11 for more information. |
device_id_hwtcl |
Revision ID | When setting this ID, ensure that
it matches the following revision IDs:
|
— | |
Class Code | Intel® |
The Intel® FPGA SDK for OpenCL™ utility checks the base class value to verify whether the board is an OpenCL™ device. Do not modify the class code settings.
|
— |
Subsystem Vendor ID | Board vendor | Identifies the manufacturer of the
accelerator board. Set this register to the vendor ID of manufacturer of your accelerator board. For the a10_ref Reference Platform, the subsystem vendor ID is 0x1172. If you are a board vendor, set this register to your vendor ID. |
subsystem_vendor_id_hwtcl |
Subsystem Device ID | Board vendor | Identifies the accelerator
board. The SDK uses this ID to identify the board because the software might perform differently on different boards. If you create a Custom Platform that supports multiple boards, use this ID to distinguish between the boards. Alternatively, if you have multiple Custom Platforms, each supporting a single board, you can use this ID to distinguish between the Custom Platforms. Important: Make this ID unique to your
Custom Platform. For example, for the a10_ref Reference
Platform, the ID is 0xA151.
|
subsystem_device_id_hwtcl |
The kernel driver uses the Vendor ID, Subsystem Vendor ID and the Subsystem Device ID to identify the boards it supports. The SDK's programming flow checks the Device ID to ensure that it programs a device with a .aocx Intel® FPGA SDK for OpenCL™ Offline Compiler executable file targeting that specific device.
Location in ID | Definition |
---|---|
15:14 | RESERVED |
13:12 | Speed
|
11 | RESERVED |
10:8 | Number of lanes
|
7:4 | Device family
|
3 | 1 — Soft IP (SIP) This ID indicates that the PCIe protocol stack is implemented in soft logic. If unspecified, the IP is considered a hard IP. |
2:0 |
Platform Designer
PCIe interface
type
|
Instantiation of the version_id Component
The version ID for the a10_ref Reference Platform is A0C7C1E6 in 17.1 release.
Before communicating with any part of the FPGA system, the host first reads from this version_id register to confirm the following:
- The PCIe can access the FPGA fabric successfully
- The address map matches the map in the MMD software
Update the VERSION_ID parameter in the version_id component to a new value with every slave addition or removal from the PCIe BAR 4 bus, or whenever the address map changes.
Definitions of Intel Arria 10 FPGA Development Kit Reference Platform Hardware Constraints in Software Headers Files
The two header files that describe the hardware design to the software are in the following locations:
- For Windows systems, the header files are in the INTELFPGAOCLSDKROOT\board\a10_ref\source\include folder, where INTELFPGAOCLSDKROOT is the path to the SDK installation.
- For Linux systems, the header files are in the INTELFPGAOCLSDKROOT/board/a10_ref/linux64/driver directory.
Header File Name | Description |
---|---|
hw_pcie_constants.h |
Header file that defines most of the hardware constants for the board design. This file includes constants such as the IDs described in PCIe Device Identification Registers, BAR number, and offset for different components in your design. In addition, this header file also defines the name strings of ACL_BOARD_PKG_NAME, ACL_VENDOR_NAME, and ACL_BOARD_NAME. Update the information in this file whenever you change the board design. |
hw_pcie_dma.h |
Header file that defines DMA-related hardware constants.
Update these addresses whenever you change the board design. Refer to the Direct Memory Access section for more information.
|
PCIe Kernel Driver for the Intel Arria 10 GX FPGA Development Kit Reference Platform
This driver is installed using the Intel® FPGA SDK for OpenCL™ install utility.
The a10_ref Reference Platform
- For Windows systems, the driver is in the
<path_to_al0pciedk>\windows64\driver folder.
The kernel driver, the WinDriver application programming interface (API), is a third-party driver from Jungo Connectivity Ltd. For more information about the WinDriver, refer to the Jungo Connectivity Ltd. website or contact a Jungo Connectivity representative.
- For Linux, an open-source MMD-compatible kernel driver is in the <path_to_al0pciedk>/linux64/driver directory. The table below highlights some of the files that are available in this directory.
File | Description |
---|---|
pcie_linux_driver_exports.h | Header file that defines the special commands
that the kernel driver supports. The installed kernel driver works as a character device. The basic operations to the driver are open(), close(), read(), and write(). To execute a complicated command, create a variable as an acl_cmd struct type, specify the command with the proper parameters, and then send the command through a read() or write() operation. This header file defines the interface of the kernel driver, which the MMD layer uses to communicate with the device. |
aclpci.c | File that implements the Linux kernel driver's
basic structures and functions, such as the init, remove, and
probe functions, as well as
hardware design-specific functions that handle interrupts. For more information on the interrupt handler, refer to the Message Signaled Interrupts section. |
aclpci fileio.c | File that implements the kernel driver's file
I/O operations. The kernel driver that is available with the a10_ref Reference Platform supports four file I/O operations: open(), close(), read(), and write(). Implementing these file I/O operations allows the OpenCL* user program to access the kernel driver through the file I/O system calls (that is, open, read, write, or close). |
aclpci cmd.c | File that implements the specific commands
defined in the pcie_linux_driver_exports.h file. These special commands include SAVE_PCI_CONTROL_REGS, LOAD_PCI_CONTROL_REGS, DO_PR, GET_PCI_SLOT_INFO, and so on. |
aclpci dma.c | File that implements DMA and host
channel-related routines in the kernel driver. Refer to the Direct Memory Access section for more information. |
aclpci pr.c | File that implements PR-related routines in the
kernel driver. Refer to the Partial Reconfiguration section for more information. |
aclpci queue.c | File that implements a queue structure for use in the kernel driver to simplify programming. |
Direct Memory Access
Hardware Considerations
The instantiation process exports the DMA controller slave ports (that is, rd_dts_slave and wr_dts_slave) and master ports (that is, rd_dcm_master and wr_dcm_master) into the PCIe module. Two additional master ports, dma_rd_master and dma_wr_master, are exported for DMA read and write operations, respectively. For the DMA interface to function properly, all these ports must be connected correctly in the board.qsys Platform Designer system, where the PCIe hard IP is instantiated.
At the start of DMA transfer, the DMA Descriptor Controller reads from the DMA descriptor table in user memory, and stores the status and the descriptor table into a FIFO address. There are two FIFO addresses: Read Descriptor FIFO address and Write Descriptor FIFO address. After storing the descriptor table into a FIFO address, DMA transfers into the FIFO address can occur. The dma_rd_master port, which moves data from user memory to the device, must connect to the rd_dts_slave and wr_dts_slave ports. Because the dma_rd_master port connects to DDR4 memory also, the locations of the rd_dts_slave and wr_dts_slave ports in the address space must be defined in the hw_pcie_dma.h file.
The rd_dcm_master and wr_dcm_master ports must connect to the txs port. At the end of the DMA transfer, the DMA controller writes the MSI data and the done status into the user memory via the txs slave. The txs slave is part of the PCIe* hard IP in board.qsys.
All modules that use DMA must connect to the dma_rd_master and dma_wr_master ports. For DDR4 memory connection, Intel® recommends implementing an additional pipeline to connect the two 256-bit PCIe DMA ports to the 512-bit memory slave. For more information, refer to the DDR4 Connection to PCIe* Host section.
Software Considerations
The MMD layer uses DMA to transfer data if it receives a data transfer request that satisfies both of the following conditions:
- A transfer size that is greater than 1024 bytes
- The starting addresses for both the host buffer and the device offset are aligned to 64 bytes
Implementing a DMA Transfer
For Windows, the Jungo WinDriver imposes a 5000 to 10000 limit on the number of interrupts received per second in user mode. This limit translates to a 2.5 gigabytes per second (GBps) to 5 GBps DMA bandwidth when a full 128-entry table of 4 KB page is transferred per interrupt.
On Windows, polling is the default method for maximizing PCIe DMA bandwidth at the expense of CPU run time. To use interrupts instead of polling, assign a non-NULL value to the ACL_PCIE_DMA_USE_MSI environment variable.
To implement a DMA transfer:
- Verify that the previous DMA transfer sent all the requested bytes of data.
-
Map the virtual memories that are requested for DMA transfer to physical
addresses.
Note: The amount of virtual memory that can be mapped at a time is system dependent. Large DMA transfers will require multiple mapping or unmapping operations. For a higher bandwidth, map the virtual memory ahead in a separate thread that is in parallel to the transfer.
- Set up the DMA descriptor table on local memory.
- Write the location of the DMA descriptor table, which is in user memory, to the DMA control registers (that is, RC Read Status and Descriptor Base and RC Write Status and Descriptor Base).
- Write the Platform Designer address of descriptor FIFOs to the DMA control registers (that is EP Read Descriptor FIFO Base and EP Write Status and Descriptor FIFO Base).
- Write the start signal to the RD_DMA_LAST_PTR and WR_DMA_LAST_PTR DMA control registers.
- After the current DMA transfer finishes, repeat the procedure to implement the next DMA transfer.
Message Signaled Interrupt
Two different modules generate the signal for the MSI line. The DMA controller in the PCIe hard IP core generates the DMA's MSI. The PCI Express interrupt request (IRQ) module (that is, the INTELFPGAOCLSDKROOT/board/a10_ref/hardware/a10gx/ip/irq_controller directory) generates the kernel interface's MSI.
For more information on the PCI Express IRQ module, refer to Handling PCIe Interrupts webpage.
Hardware Considerations
In INTELFPGAOCLSDKROOT/board/a10_ref/hardware/a10gx/board.qsys, the DMA MSI is connected internally; however, you must connect the kernel interface interrupt manually. For the kernel interface interrupt, the PCI Express IRQ module is instantiated as pcie_irq_0 in board.qsys. The kernel interface interrupts connections are as follows:
- The kernel_irq_to_host port from the OpenCL Kernel Interface (kernel_interface) connects to the interrupt receiver, which allows the OpenCL kernels to signal the PCI Express IRQ module to send an MSI.
- The PCIe hard IP's msi_intfc port connects to the MSI_Interface port in the PCI Express IRQ module. The kernel interface interrupt receives the MSI address and the data necessary to generate the interrupt via msi_intfc.
- The IRQ_Gen_Master port on the PCI Express IRQ module, which is used to write the MSI, connects to the txs port on the PCIe hard IP.
- The IRQ_Read_Slave and IRQ_Mask_Slave ports connect to the pipe_stage_host_ctrl module on Bar 4. After receiving an MSI, the user driver can read the IRQ_Read_Slave port to check the status of the kernel interface interrupt, and read the IRQ_Mask_Slave port to mask the interrupt.
Software Considerations
The interrupt service routine in the Linux driver checks which module generates the interrupt. For the DMA's MSI, the driver reads the DMA descriptor table's status bit in local memory, as specified in the Read DMA Example section of the Intel® Arria® 10 Avalon-MM DMA Interface for PCIe Solutions User Guide. For kernel interface's MSI, the driver reads the interrupt line sent by the kernel interface.
The interrupt service routine involves the following tasks:
- Check DMA status on the DMA descriptor table.
- Read the kernel status from the IRQ_READ_SLAVE port on the PCI Express IRQ module.
- If a kernel interrupt was triggered, mask the interrupt by writing to the IRQ_MASK_SLAVE port on the PCI Express IRQ module. Then, execute the kernel interrupt service routine.
- If a DMA interrupt was triggered, reset the DMA descriptor table and execute the DMA interrupt service routine.
- If applicable, unmask a masked kernel interrupt.
Partial Reconfiguration
You can only use PR when the static board interface, generated during base compilations, matches the static region of the design that is used to compile the OpenCL kernel's PR region.
For Windows MMD implementation, the INTELFPGAOCLSDKROOT\board\a10_ref\source\host\mmd\acl_pcie_config.cpp file contains the MMD code that communicates with the PR configuration controller within the static region of the design. The program_core_with_PR_file function within the acl_pcie_config.cpp file requires a handle to the PR bitstream and the length of the PR bitstream in order to perform the PR operation.
For Linux driver implementation, the INTELFPGAOCLSDKROOT/board/a10_ref/linux64/driver/aclpci_pr.c file includes the main host driver routine that communicates with the PR configuration controller within the static region of the design. The aclpci_pr function within the acl_pci_pr.c file requires the following information in order to perform the PR operation:
- A handle to the board
- A handle to the PR bitstream
- The length of the PR bitstream
After verifying that the device is opened, the bitstream is of adequate length, and the PCIe endpoint of the device is reachable, the aclpci_pr function writes 0x1 to the PR IP status register. Then, the aclpci_pr function writes the complete bitstream, 32 bits at a time, to the PR IP. After the bitstream transfer is complete, the aclpci_pr function performs a read operation to the PR IP status register to verify whether PR is successful. A return value of 0x14 indicates a successful PR operation; any other return value indicates an error.
To override the default reconfiguration mechanism, set the ACL_PCIE_USE_JTAG_PROGRAMMING environment variable, as shown below:
- For Windows, type set ACL_PCIE_USE_JTAG_PROGRAMMING=1 at the command prompt.
- For Linux, type export ACL_PCIE_USE_JTAG_PROGRAMMING=1 at the command prompt.
Setting ACL_PCIE_USE_JTAG_PROGRAMMING specifies that JTAG full-chip configuration is the default mechanism for reconfiguring the device.
Cable Autodetect
The Intel® Arria® 10 GX FPGA Development Kit Reference Platform automatically tries to detect the cable by default when programming the FPGA via the Intel FPGA Download Cable.
You can set the ACL_PCIE_JTAG_CABLE or ACL_PCIE_JTAG_DEVICE_INDEX environment variables to disable the auto-detect feature and use values that you define.
Cable autodetect is useful when you have multiple devices connected to a single host and PR cannot be used to program the FPGA.
The memory-mapped device (MMD) uses in-system sources and probes to identify the cable connected to the target board. You must instantiate the cade_id register block and connect it to Bar 4 with the correct address map. You must also instantiate board_in_system_sources_probes_cade_id, which is an in-system sources and probe component, and connect it to cade_id register.
The MMD must be updated to take in the relevant changes. Add the scripts/find_jtag_cable.tcl script to your custom platform.
When the FPGA is being programmed via the Intel FPGA Download Cable, the MMD invokes quartus_stp to execute the find_jtag_cable.tcl script. The script identifies the cable and index number which is then used to program the FPGA through the quartus_pgm command.
Host Channel
The streaming interface makes use of DMA.
Host Channel IP Instantiation
IP Parameters | Description |
---|---|
HOST_CHANNEL_DEPTH - 2048 |
Depth of the internal buffer that DMA transfers data to and from. There are two buffers, and both of their depths are set by this parameter. |
HOST_CHANNEL_VALID_BUFFER_USE_LAB - 0 | Set to 1 to use LABs to instantiate the internal buffer, and 0 to use block RAMs. |
Host Channel Top Connection to PCIe DMA
These ports must be connected to the DMA descriptor controller in the host channel IP.
Base address offset must match the address mentioned in the board.qsys file, since these addresses are used by the internal descriptor controller of a10gx board variant.
- On acl_hostchannel_top IP, rd_dma and wr_dma ports are used to receive and send data to DMA. These ports must be connected to the corresponding ports on PCIe* IP with base address offset matching in the hw_host_channel.h header file.
- The cra port must be connected to the host_ctrl.
- The msi_interface port must be connected to Intel® Arria® 10 Hard IP for PCI Express* , and the msi_interface_out must be connected to the msi_interface port of the pcie_irq.
Host Channel Top Connection to OpenCL Kernel
For one FIFO, the in port should be connected to the stout port of the host channel top, while the out port is exported. For the second FIFO, out port should be connected to the stin port of the host channel top, while the in port is exported.
In the board_spec.xml, the host channel ports are IO ports.
DDR4 as Global Memory for OpenCL Applications
In the current version of the a10_ref Reference Platform, all Platform Designer components related to the DDR4 global memory are now part of the INTELFPGAOCLSDKROOT/board/a10_ref/hardware/a10gx/acl_ddr4_a10.qsys Platform Designer subsystem within board.qsys. In addition, the location of the clock domain crossings has changed to increase the number of blocks operating in the slower PCIe* domain. With this modified structure, you can add multiple memories with different clock domains to the system.
If you have a Custom Platform that is ported from a previous version of the a10_ref Reference Platform, you have the option to modify your Custom Platform as described above. This modification is not mandatory.
Dependencies
DDR4 external memory interfaces
For more information on the DDR4 external memory interface IP, refer to the DDR2, DDR3, and DDR4 SDRAM Board Design Guidelines section in External Memory Interface Handbook Volume 2: Design Guidelines.
DDR4 IP Instantiation
IP Parameter | Configuration Setting |
---|---|
Timing Parameters | As per the computing card's data specifications. |
Avalon Width Power of 2 | Currently, OpenCL™ does not support non-power-of-2 bus widths. As a result, the a10_ref Reference Platform uses the option that forces the DDR4 controller to power of 2. Use the additional pins of this x72 core for error checking between the memory controller and the physical module. |
Byte Enable Support | Enabled Byte enable support is necessary in the core because the Intel® FPGA SDK for OpenCL™ requires byte-level granularity to all memories. |
Performance | Enabling the reordering of DDR4 memory accesses
and a deeper command queue look-ahead depth might provide increased
bandwidth for some OpenCL kernels. For a target application, adjust
these and other parameters as necessary. Note: Increasing the command queue
look-ahead depth allows the DDR4 memory controller to reorder
more memory accesses to increase efficiency, which improves
overall memory throughput.
|
Debug | Disabled for production. |
DDR4 Connection to PCIe Host
The DDR4 IP core has one bank where its width and address configurations match those of the DDR4 SDRAM. Intel® tunes the other parameters such as burst size, pending reads, and pipelining. These parameters are customizable for an end application or board design.
The Avalon® master interfaces from the OpenCL Memory Bank Divider component connect to their respective memory controllers. The Avalon® slave connects to the PCIe* and DMA IP core. Implementations of appropriate clock crossing and pipelining are based on the design floorplan and the clock domains specific to the computing card. The OpenCL Memory Bank Divider section in the Intel® FPGA SDK for OpenCL™ Custom Platform Toolkit User Guide specifies the connection details of the snoop and memorg ports.
The INTELFPGAOCLSDKROOT/board/a10_ref/hardware/a10gx/board.qsys Platform Designer system uses a custom UniPHY Status to AVS IP component to aggregate different UniPHY status conduits into a single Avalon® slave port named s. This slave port connects to the pipe_stage_host_ctrl component so that the PCIe host can access it.
DDR4 Connection to the OpenCL Kernel
A clock crosser is necessary because the kernel interface for the compiler must be clocked in the kernel clock domain. In addition, the width, address width, and burst size characteristics of the kernel interface must match those specified in the OpenCL Memory Bank Divider connecting to the host. Appropriate pipelining also exists between the clock crosser and the memory controller.
Host Connection to OpenCL Kernels
The OpenCL Kernel Interface component exports an Avalon® master interface (kernel_cra) that connects to this slave port. The OpenCL Kernel Interface component also generates the kernel reset (kernel_reset) that resets all logic in the kernel clock domain.
The Intel® Arria® 10 FPGA Development Kit Reference Platform has one DDR4 memory bank. As a result, the Reference Platform instantiates the OpenCL Kernel Interface component and sets the Number of global memory systems parameter to 1.
Intel Arria 10 FPGA System Design
Examples of design complexities:
- Designing a robust reset sequence
- Establishing a design floorplan
- Managing global routing
- Pipelining
Optimizations of these design complexities occur in tandem with one another to meet timing and board hardware optimization requirements.
Clocks
These clock domains include:
- 250 MHz PCIe* clock
- 300 MHz DDR4 clock
- 50 MHz general clock (config_clk)
- 125 MHz kernel reference clock
- Kernel clock that can have any clock frequency
With the exception of the kernel clock, the a10_ref Reference Platform is responsible for the timing closure of these clocks. However, because the board design must clock cross all interfaces in the kernel clock domain, the board design also has logic in the kernel clock domain. It is crucial that this logic is minimal and achieves an Fmax higher than typical kernel performance.
Resets
These reset drivers include:
- The por_reset_counter in the INTELFPGAOCLSDKROOT/board/a10_ref/hardware/a10gx/board.qsys Platform Designer system implements the power-on-reset. The power-on-reset resets all the hardware on the device by issuing a reset for a number of cycles after the FPGA completes configuration.
- The PCIe® bus issues a perst reset that resets all hardware on the device.
- The OpenCL™ Kernel Interface component issues the kernel_reset that resets all logic in the kernel clock domain.
The power-on-reset and the perst reset are combined into a single global_reset; therefore, there are only two reset sources in the system (that is, global_reset and kernel_reset). However, these resets are explicitly synchronized across the various clock domains, resulting in several reset interfaces.
Important Considerations Regarding Resets
- Synchronizing resets to different clock domains might cause several high
fan-out resets.
Platform Designer automatically synchronizes resets to the clock domain of each connected component. In doing so, the Platform Designer instantiates new reset controllers with derived names that might change when the design changes. This name change makes it difficult to make and maintain global clock assignments to some of the resets. As a result, for each clock domain, there are explicit reset controllers. For example, global_reset drives reset_controller_pcie and reset_controller_ddr4. However, they are synchronized to the PCIe and DDR4 clock domains, respectively.
- Resets and clocks must work together to propagate reset to all logic.
Resetting a circuit in a given clock domain involves asserting the reset over a number of clock cycles. However, your design may apply resets to the PLLs that generate the clocks for a given clock domain. This means a clock domain can hold in reset without receiving the clock edge that is necessary for synchronous resets. In addition, a clock holding in reset might prevent the propagation of a reset signal because it is synchronized to and from that clock domain. Avoid such situations by ensuring that your design satisfies the following criteria:
- Generate the global_reset signal off the free-running config_clk.
- The ddr4_calibrate IP resets the External Memory Interface controller separately.
- Apply resets to both reset interfaces of a clock-crossing bridge or FIFO
component.
FIFO content corruption might occur if only part of a clock-crossing bridge or a dual-clock FIFO component is reset. These components typically provide a reset input for each clock domain; therefore, reset both interfaces or none at all. For example, in the a10_ref Reference Platform, kernel_reset resets all the kernel clock-crossing bridges between DDR on both the m0_reset and s0_reset interfaces.
Floorplan
Dependencies
- Partial Reconfiguration
- Chip Planner
- Logic Lock Plus regions
Intel® performed the following tasks iteratively to derive the floorplan of the a10_ref Reference Platform:
- Compile a design without any region or floorplanning constraints.
Intel® recommends that you compile the design with several seeds.
- Examine the placement of the IP cores (for example, PCIe* , DDR4, Avalon® interconnect pipeline stages and adapters) for candidate locations, as determined by the Intel® Quartus® Prime Pro Edition software's Fitter. In particular, Intel® recommends examining the seeds that meet or almost meet the timing constraints.
For the a10_ref Reference Platform, the PCIe* I/O is located in the lower left corner of the Intel® Arria® 10 FPGA. The DDR4 I/O is located on the top part of the left I/O column of the device. Because the placements of the PCIe and DDR4 IP components tend to be close to the locations of their respective I/Os, you can apply Logic Lock Plus regions to constrain the IP components to those candidate regions.

As shown in this Chip Planner view of the floorplan, the two Logic Lock Plus regions are spread out between the PCIe I/O and the top region of the left I/O column (that is, the DDR4 I/O area).
- The largest Logic Lock Plus region (Region 1) covers the PCIe I/O and contains most of the static board interface logic.
- Regions 2 contains an Avalon interconnect pipeline stage that bridges the PCIe I/O and DDR4 I/O regions. The Avalon interconnect pipeline stages also help improve the timing closure rate of the static board interface part of the design.
You must create a dedicated Logic Lock Plus region for the OpenCL™ kernel system. Furthermore, do not place kernel logic in the board's Logic Lock Plus regions (that is, static region). The static region and the OpenCL kernel system region (that is, PR region) do not overlap each other. As shown in Figure 2, the logic for the boardtest.cl OpenCL kernel, that is, the scatter area, can be placed anywhere except within the seven Logic Lock Plus regions.
Intel® recommends the following strategies to maximize the available FPGA resources for the OpenCL kernel system to improve kernel routability:
- The OpenCL kernel system PR region should cover the entire device except the Logic Lock Plus regions of the board.
- The size of a Logic Lock Plus region should be just large enough to contain the board logic and to meet timing constraints of the board clocks. Oversized Logic Lock Plus regions consume FPGA resources unnecessarily.
- Avoid creating tightly-packed Logic Lock Plus regions that cause very high logic utilization
and high routing congestion.
High routing congestion within the Logic Lock Plus regions might decrease the Fitter's ability to route OpenCL kernel signals through the regions.
In the case where the board clocks are not meeting timing and the critical path is between the Logic Lock Plus regions (that is, across region-to-region gap), insert back-to-back pipeline stages on paths that cross the gap. For example, if the critical path is between Region 1 and Region 2, lock down the first pipeline stage (an Avalon-MM Pipeline Bridge component) to Region 1, lock down the second pipeline stage to Region 2, and connect the two pipeline stages directly. This technique ensures that pipeline registers are on both sides of the region-to-region gap, thereby minimizing the delay of paths crossing the gap.
Refer to the Pipelining section for more information.
Global Routing
There is no restriction on the placement location of the OpenCL™ kernel on the device. As a result, the kernel clocks and kernel reset must distribute high fan-out signals globally.
Pipelining
In the Platform Designer, you can implement pipelines via an Avalon® -MM Pipeline Bridge component by setting the following pipelining parameters within the Avalon® -MM Pipeline Bridge dialog box:
- Select Pipeline command signals
- Select Pipeline response signals
- Select both Pipeline command signals and Pipeline response signals
Examples of Pipeline Implementation
- Signals that traverse long distances because of the floorplan's shape or the
region-to-region gaps require additional pipelines.
The DMA at the bottom of the FPGA must connect to the DDR4 memory at the top of the FPGA. To achieve timing closure of the board interface logic at a DDR4 clock speed of 300 MHz, additional pipeline stages between the OpenCL™ Memory Bank Divider component and the DDR4 controller IP are necessary. In the Intel® Arria® 10 GX FPGA Development Kit Reference Platform's board.qsys Platform Designer system, the pipeline stages are named pipe_stage_ddr4a_dimm_*.
The middle pipeline stage, pipe_stage_ddr4a_dimm, combines both the direct kernel DDR4 accesses and the accesses through the OpenCL Memory Bank Divider. The multistage pipeline approach ensures that the kernel entry point to the pipeline is neither geared towards the OpenCL Memory Bank Divider, which is close to the PCIe* IP core, nor the DDR4 IP core, which is at the very top of the FPGA.
DDR4 Calibration
The driver within the a10_ref Reference Platform can detect a failed calibration via the Uniphy Status to AVS IP, and retrigger calibration through the ddr4_calibrate IP block.
Kernel Reprogramming via Partial Reconfiguration
Dependencies
Intel® Quartus® Prime Pro Edition software's Partial Reconfiguration featureTo ensure that the device functions properly during and after PR reprogramming, following these rules:
- Place a freeze wrapper around the PR region. The freeze wrapper
holds the critical control outputs from the PR region in a known, inactive state
during the reprogramming of the logic inside the PR region.
The INTELFPGAOCLSDKROOT/board/a10_ref/hardware/a10gx/ip/freeze_wrapper.v file implements the freeze wrapper, where INTELFPGAOCLSDKROOT is the path to the SDK installation.
- Hold the kernel_reset_n signal, which is routed using Global Clock resources, in a logic 1 (deasserted) state during reprogramming of the PR region. When programming completes, assert the kernel_reset_n signal (that is, set it to the low state) before disabling the freeze wrapper. Asserting the kernel_reset_n signal resets all logic in the PR region to a known state. This assertion step is necessary because the state of all flipflops in the PR region is undefined after PR programming. The logic in the freeze_wrapper.v file implements the required behavior for the reset and freeze signals.
Dynamic PLL Reconfiguration
In both PR reprogramming and full-chip JTAG programming, the PLL is dynamically reconfigured by default after FPGA configuration completes. This default dynamic PLL reconfiguration step is unnecessary after full-chip programming because the correct PLL settings are already part of the .sof file programmed onto the FPGA over JTAG.
Guaranteed Timing Closure of the Intel Arria 10 GX FPGA Development Kit Reference Platform Design
Both the SDK and the Custom Platform contribute to the implementation of the SDK's guaranteed timing closure feature.
The SDK provides the IP to generate the kernel clock, and a post-flow script that ensures this clock is configured with a safe operating frequency confirmed by timing analysis. The Custom Platform developer imports a post-fit netlist that has already achieved timing closure on all non-kernel clocks.
Supply the Kernel Clock
The REF_CLK_RATE parameter specifies the frequency of the reference clock that connects to the kernel PLL (pll_refclk). For the a10_ref Reference Platform, the REF_CLK_RATE frequency is 125 MHz.
The KERNEL_TARGET_CLOCK_RATE parameter specifies the frequency that the Intel® Quartus® Prime Pro Edition software attempts to achieve during compilation. The board hardware contains some logic that the kernel clock clocks. At a minimum, the board hardware includes the clock crossing hardware. To prevent this logic from limiting the Fmax achievable by a kernel, the KERNEL_TARGET_CLOCK_RATE must be higher than the frequency that a simple kernel can achieve on your device. For the Intel® Arria® 10 GX FPGA Development Kit that the a10_ref Reference Platform targets, the KERNEL_TARGET_CLOCK_RATE is 400 MHz.
When developing a Custom Platform, setting a high target Fmax might cause difficulty in achieving timing closure.
When developing your Custom Platform and attempting to close timing, add an overriding SDC definition to relax the timing of the kernel. The following code example from the INTELFPGAOCLSDKROOT/board/a10_ref/hardware/a10gx/top_post.sdc file applies a 5 ns (200 MHz) maximum delay constraint on the OpenCL kernel during base revision compilations:
if {! [string equal $::TimeQuestInfo(nameofexecutable) "quartus_map"]} { if { [get_current_revision] eq "base" } { post_message -type critical_warning "Compiling with slowed OpenCL Kernel clock. This is to help achieve timing closure for board bringup." if {! [string equal $::TimeQuestInfo(nameofexecutable) "quartus_sta"]} { set kernel_keepers [get_keepers system_inst\|kernel_system\|*] set_max_delay 5 -from $kernel_keepers -to $kernel_keepers } } }
Guarantee Kernel Clock Timing
In the import revision compilation, the compilation script import_compile.tcl invokes the INTELFPGAOCLSDKROOT/board/a10_ref/hardware/a10gx/scripts/post_flow.tcl Tcl script in the a10_ref Reference Platform after every Intel® Quartus® Prime Pro Edition software compilation using quartus_cdb.
The post_flow.tcl script also determines the kernel clock and configures it to a functional frequency.
Provide a Timing-Closed Post-Fit Netlist
Dependencies
Intel® Quartus® Prime Pro Edition compiler
Intel® Quartus® Prime software provides several mechanisms for preserving the placement and routing of some previously compiled logic and importing this logic into a new compilation. For Intel® Arria® 10 devices, the previously compiled logic is imported into the compilation flow.
The Intel® Quartus® Prime Pro Edition compilation flow can preserve the placement and routing of the board interface partition via the exported Intel® Quartus® Prime Archive File. The base.qdb file contains all the database files for the base compilation of root_partition. The a10_ref Reference Platform is configured with the project revisions and partitioning that are necessary to implement the compilation flow. By default, the SDK invokes the Intel® Quartus® Prime Pro Edition software on the top revision. This revision is configured to import and restore the base.qdb file, which has been pre-compiled and exported from a base revision compilation.
When developing your Custom Platform from the a10_ref Reference Platform, it is essential to maintain the flat.qsf, base.qsf, top.qsf, and top_synth.qsf Intel® Quartus® Prime Settings Files.
The a10_ref Reference Platform includes two additional partitions: the Top partition and the kernel_system partition. The Top partition contains all logic, and the kernel_system partition contains the logic in the PR region. The PR region is specified by the following assignments:
set_instance_assignment -name PARTIAL_RECONFIGURATION_PARTITION ON -to freeze_wrapper_inst|kernel_system_inst
Intel Quartus Prime Compilation Flow and Scripts
Invoke the Intel® Quartus® Prime compilation flow by calling the following quartus_sh executables:
- The board developer runs the quartus_sh --flow compile top -c base command to execute the base revision compilation. This compilation closes timing, locks down the static region, and generates the base.qdb file.
- The user of the Intel® Arria® 10 FPGA Development Kit Reference Platform or a Custom Platform runs the quartus_sh -t import_compile.tcl command to execute the import revision compilation. This compilation generates programming files that are guaranteed to be timing closed and PR-compatible with each other.
Enabling the Intel Quartus Prime Forward-Compatibility Flow
Enabling the forward-compatibility flow allows you to use board vendor-generated precompiled post-fit netlists, in the form of the base.qdb file, in a future Intel® Quartus® Prime Pro Edition software version. The forward-compatibility flow eliminates the need to match the Intel® Quartus® Prime Pro Edition software version used to develop the Custom Platform and the version used to run the Custom Platform.
If you are migrating a previous version of the Intel® Arria® 10 GX FPGA Development Kit Reference Platform to the current version and you want to incorporate the forward-compatibility flow, perform the following tasks:
-
Add the following command in the
INTELFPGAOCLSDKROOT/board/a10_ref/hardware/a10gx/scripts/post_flow_pr.tcl
script to generate a forward-compatible base.qdb file:
quartus_cdb top -c base --export_design --snapshot final --file base.qdb
For information on the function of the post_flow_pr.tcl script, refer to Quartus Prime Compilation Flow for Board Developers.
-
In the
INTELFPGAOCLSDKROOT/board/a10_ref/hardware/a10gx/import_compile.tcl
script, add the quartus_fit and then the
quartus_asm commands after importing
the base.qdb file.
Running these commands verifies that the imported base.qdb file is usable in the Intel® Quartus® Prime Pro Edition software version that Custom Platform users work with.
For more information on the function of the import_compile.tcl script, refer to the Intel® Quartus® Prime Compilation Flow for Custom Platform Users.
Intel Quartus Prime Compilation Flow for Board Developers
The script performs the necessary tasks to ensure that the import revision compilations using the timing-closed and locked-down static region are PR-compatible with each other.
Running the quartus_sh --flow compile top -c base command executes the following tasks:
- Runs quartus_syn to execute the Analysis and Synthesis stage of the Intel® Quartus® Prime compilation flow.
- Runs quartus_fit to execute the Place and Route stage of the Intel® Quartus® Prime compilation flow.
- Runs quartus_sta to execute the Static Timing Analysis stage of the Intel® Quartus® Prime compilation flow.
- Runs the
INTELFPGAOCLSDKROOT/board/a10_ref/hardware/a10gx/scripts/post_flow_pr.tcl
file.
The post_flow_pr.tcl script determines the maximum frequency at which the OpenCL™ kernel can run and generates the corresponding PLL settings. The script then reruns static timing analysis. The script also exports the compilation database of the base revision compilation results as a forward-compatible Partition Database File (.qdb). Refer to the QDB File Generation section for more information.
- Runs quartus_asm to generate the .sof file with updated embedded PLL settings. Updating the .sof file allows it to run safely on the board with the maximum kernel frequency.
- Generates the fpga.bin file, which contains the full-chip programming file. The full-chip programming file (base.sof) is in the .acl.sof section of the fpga.bin file.
The .aocx file that the base revision compilation flow generates only contains the .sof full-chip programming file. It does not contain a programming file that can be used with PR because this .aocx file is only intended to be written to Flash memory as the default FPGA image. The Intel® FPGA SDK for OpenCL™ program utility automatically uses JTAG programming when it programs with a .aocx file from the base revision compilation. Only the import revision compilation flow, executed by the SDK user, generates a .aocx file that can be used with PR.
Intel Quartus Prime Compilation Flow for Custom Platform Users
The import_compile.tcl script executes the following tasks:
- Runs the
INTELFPGAOCLSDKROOT/board/a10_ref/hardware/a10gx/scripts/pre_flow_pr.tcl
file. The pre_flow_pr.tcl script generates the
board.qsys and the kernel_system.qsys
Platform Designer System Files.
Refer to the Platform Designer System Generation section for more information.
- Imports the base revision compilation results as a .qdb file.
Refer to the QDB File Generation section for more information.
- Runs quartus_fit and quartus_asm to verify that the .qdb file is forward compatible.
- Runs quartus_syn to execute the Analysis and Synthesis stage of the Intel® Quartus® Prime compilation flow for the kernel partition only.
- Runs quartus_fit to execute the Place and Route stage of the Intel® Quartus® Prime compilation flow for the entire design.
- Runs quartus_sta to execute the static timing analysis stage of the Intel® Quartus® Prime compilation flow.
- Runs the INTELFPGAOCLSDKROOT/board/a10_ref/hardware/a10gx/scripts/post_flow_pr.tcl file. The post_flow_pr.tcl script determines the maximum frequency at which the OpenCL™ kernel can run and generates the corresponding PLL settings. The script then reruns the static timing analysis.
- Runs quartus_asm to generate the full-chip programming files for the base revision.
- Runs quartus_asm to generate the full-chip programming files for the import revision.
- Generates the fpga.bin file, which
contains the following files and IDs:
- The top.sof full-chip programming file.
- The top.rbf PR programming file.
- The pr_base.id unique ID for PR base revision.
Before quartus_asm generates the .sof file in an import revision compilation, the static region of the import revision compilation is compared to the static region of the base revision compilation to check for errors. To prevent a mismatch error in the I/O configuration shift register (IOCSR) bits, the PLL settings in the base.sof and top.sof files must be identical. When designing the Intel® Arria® 10 FPGA Development Kit Reference Platform, Intel® ensured in the import_compile.tcl Tcl script that the PLL settings in both the base.sof file and the top.sof file are identical, resulting in an additional quartus_asm execution step to regenerate the base.sof file.
Platform Designer System Generation
The INTELFPGAOCLSDKROOT environment variable points to the location of the Intel® FPGA SDK for OpenCL™ installation directory.
The board.qsys Platform Designer system represents the bulk of the static region. The kernel_system.qsys Platform Designer system is the top-level of the PR region. The pre_flow_pr.tcl script generates both Platform Designer systems on the fly before the beginning of the Intel® Quartus® Prime compilation flow in both the flat and import revision compilations.
QDB File Generation
The INTELFPGAOCLSDKROOT/board/a10_ref/hardware/a10gx/scripts/post_flow_pr.tcl script creates the base.qdb file. The .tcl file invokes the export_design command to export the entire base revision compilation database to the base.qar file that also contains the base.sdc and pr_base.id files. For your Custom Platform, you do not need to add the base.sdc and pr_base.id files to the board directory (that is, INTELFPGAOCLSDKROOT/board/<custom_platform>/hardware/<board_name> ) separately.
Hash Checking
The unique ID is generated at the beginning of a base revision compilation using the MD5 message-digest algorithm. The MD5 algorithm generates a hash of a text file that contains the current working directory and a high-resolution timer value. The MD5 algorithm then truncates the hash to a 32-bit value. The INTELFPGAOCLSDKROOT/board/a10_ref/hardware/a10gx/scripts/pre_flow_pr.tcl script stores this 32-bit value in the pr_base_id register IP within the board.qsys Platform Designer system by overwriting the default value of 0xdeadbeef.
The unique ID for the base revision compilation is added to the pr_base.id file. The ID becomes part of the import revision compilation directory after the pr_base.id file is copied from the INTELFPGAOCLSDKROOT/board/a10_ref/hardware/a10gx directory. During the fpga.bin generation step of the import revision compilation, the unique ID is added as the .acl.hash section of the fpga.bin file.
When the Intel® FPGA SDK for OpenCL™ user invokes the aocl program utility to reconfigure the FPGA, the software first checks that the pr_base id value in the currently programmed static region matches the hash value in the fpga.bin section within the .aocx file. If the two 32-bit values match, it is safe to execute partial reconfiguration. If the 32-bit values do not match, the aocl program utility performs full-chip JTAG programming via Intel® FPGA Download Cable.
Addition of Timing Constraints
The order of the application of time constraints is based on the order of appearance of the top.sdc and top_post.sdc files in the top.qsf file.
One noteworthy constraint in the a10_ref Reference Platform is the multicycle constraint for the kernel reset in the top_post.sdc file. Using global routing saves routing resources and provides more balanced skew. However, the delay across the global route might cause recovery timing issues that limit kernel clock speed. Therefore, it is necessary to include a multicycle path on the global reset signal.
Connection of the Intel Arria 10 GX FPGA Development Kit Reference Platform to the Intel FPGA SDK for OpenCL
For each hardware design, your Custom Platform also requires a board_spec.xml file for each hardware design that describes the hardware.
The following sections describe the implementation of these files for the Intel® Arria® 10 GX FPGA Development Kit Reference Platform.
Describe the Intel Arria 10 GX FPGA Development Kit Reference Platform to the Intel FPGA SDK for OpenCL
Details of each field in the board_env.xml file are available in the Creating the board_env.xml File section of the Intel® FPGA SDK for OpenCL™ Custom Platform Toolkit User Guide.
In the a10_ref Reference Platform, Intel® uses the bin folder for Windows dynamic link libraries (DLLs), the lib directory for delivering libraries, and the libexec directory for delivering the SDK utility executables. This directory structure allows the PATH environment variable to point to the location of the DLLs (that is, bin) in isolation of the SDK utility executables.
Describe the Intel Arria 10 GX FPGA Development Kit Reference Platform Hardware to the Intel FPGA SDK for OpenCL
Device
The device section contains the name of the device model file available in the INTELFPGAOCLSDKROOT/share/models/dm directory of the SDK and in the board spec.xml file. The used_resources element accounts for all logic outside of the kernel partition. The value of used_resources for alms equals the difference between the total number of adaptive logic modules (ALMs) used in final placement and the total number of ALMs available to the kernel partition. You can derive this value from the Partition Statistic section of the Fitter report after a compilation. Consider the following ALM categories within an example Fitter report:
+----------------------------------------------------------------------------------+ ; Fitter Partition Statistics ; +----------------------+-----------------+-----------------------------------------+ ; Statistic ; l ; freeze_wrapper_inst|kernel_system_inst ; +----------------------+-----------------+-----------------------------------------+ ; ALMs needed [=A-B+C] ; 0 / 427200 (0%) ; 0 / 385220 (0%) ;
The value of used_resources equals the total number of ALMs in l minus the total number of ALMs in freeze wrapper inst|kernel_system_inst. In the example above, used_resources = 427200 - 385220 = 41980 ALMs.
You can derive used_resources for rams and dsps in the same way using M20Ks and DSP blocks, respectively. The used_resources value for ffs is four times the used_resources value for alms because there are two primary and two secondary logic registers per ALM.
Global Memory
In the board_spec.xml file, there is one global_mem section for DDR memory. Assign the string DDR to the name attribute of the global_mem element. The board instance in Platform Designer provides all of these interfaces. Therefore, the string board is specified in the name attribute of all the interface elements within global_mem.
- DDR
Because DDR memory serves as the default memory for the board that the a10_ref Reference Platform targets, its address attribute begins at zero. Its config_addr is 0x018 to match the memorg conduit used to connect to the corresponding OpenCL Memory Bank Divider for DDR.
Attention: The width and burst sizes must match the parameters in the OpenCL Memory Bank Divider for DDR (memory_bank_divider).
Interfaces
The interfaces section describes kernel clocks, reset, CRA, and snoop interfaces. The OpenCL Memory Bank Divider for the default memory (in this case, memory_bank_divider) exports the snoop interface described in the interfaces section. The width of the snoop interface should match the width of the corresponding streaming interface.
Intel Arria 10 FPGA Programming Flow
In the order from the longest to the shortest configuration time, the three FPGA programming methods are as follows:
- To replace both the FPGA periphery and the core while maintaining the programmed state after power cycling, use Flash programming.
- To replace both the FPGA periphery and the core, use the Intel® Quartus® Prime Programmer command-line executable (quartus_pgm) to program the device via cables such as the Intel® FPGA Download Cable (formerly USB-Blaster).
- To replace only the kernel portion of the device, use PR.
The default FPGA programming flow is to use PR over PCIe* . The Partial Reconfiguration Controller IP instantiates PR over PCIe* using the following IP parameter settings:
Parameter | Setting |
---|---|
Settings | |
Use as PR Internal Host | Enabled |
Enable Avalon-MM slave interface | Enabled |
Input data width | 32 bits |
Clock-to-Data ratio | 1 |
Divide error detection frequency by | 1 |
Advanced Settings | |
Auto-instantiate PR block | Enabled |
Auto-instantiate CRC block | Enabled |
The 50 MHz config_clk clocks the Partial Reconfiguration Controller IP. The Avalon® -MM interface connects to the host control bus on PCIe* BAR4. Using PCIe* Gen3x8 under these configuration settings, the duration of partial reconfiguration of the PR region is about 1.6 seconds.
You cannot use PR if there is a mismatch between the hash within the .aocx file and the hash in the static region of the current image on the FPGA. In this case, program the FPGA via Intel® FPGA Download Cable by invoking quartus_pgm instead. If the .aocx file is not PR compatible with the current image on the FPGA, the Intel® Quartus® Prime Programmer displays the following message:
aocl program acl0 boardtest.aocx aocl program: Running program from <path_to_a10_ref>/linux64/libexec Reprogramming device with handle 1 MMD INFO : [acla10_ref0] PR base and import compile IDs do not match MMD INFO : [acla10_ref0] PR base ID currently configured is 0x7d056bf2 MMD INFO : [acla10_ref0] PR import compile expects ID to be 0x30242eb9 mmd program_device: Board reprogram failed
Only use quartus_pgm via Intel® FPGA Download Cable if you use a cable to connect the board and the host computer. Cabling is a point of potential failure, and it does not scale well to large deployments. If possible, reserve the quartus_pgm programming approach for development and testing purposes only.
If PR fails, an attempt is automatically made to detect the Intel® FPGA Download Cable and do a full JTAG programming.
Define the Contents of the fpga.bin File for the Intel Arria 10 GX FPGA Development Kit Reference Platform
Field | Description |
---|---|
.acl.sof | The full programming bitstream for the compiled design. This section appears in the fpga.bin files generated from both the base revision and the import revision compilations. |
.acl.core.rbf | The PR programming bitstream for the kernel region. This section only appears in the fpga.bin file generated from import revision compilation. |
.acl.hash | The unique ID for the base revision compilation. This section only appears in the fpga.bin file generated from import revision compilation. |
Host-to-Device MMD Software Implementation
The source codes of an MMD library that demonstrates good performance are available in the INTELFPGAOCLSDKROOT/board/a10_ref/source/host/mmd directory. Refer to the Host-to-Device MMD Software Implementation section in the Stratix V Network Reference Platform Porting Guide for more information.
For more information on the MMD API functions, refer to the MMD API Descriptions section of the Intel® FPGA SDK for OpenCL™ Custom Platform Toolkit User Guide.
Implementation of Intel FPGA SDK for OpenCL Utilities
aocl install
Windows
The install.bat script is in the <your_custom_platform>\windows64\libexec directory, where <your_custom_platform> points to the top-level directory of your Custom Platform. This install.bat script triggers the install executable from Jungo Connectivity Ltd. to install the WinDriver on the host machine.
Linux
The install script is located in the <your_custom_platform>/linux64/libexec directory. This install script first compiles the kernel module in a temporary location and then performs the necessary setup to enable automatic driver loading after reboot.
aocl uninstall
Windows
The uninstall.bat script is located in the <your_custom_platform>\windows64\libexec directory, where <your_custom_platform> points to the top-level directory of your Custom Platform. This uninstall.bat script triggers the uninstall executable from Jungo Connectivity Ltd. to uninstall the WinDriver on the host machine.
Linux
The uninstall script is located in the <your_custom_platform>/linux64/libexec directory. This uninstall script removes the driver module from the Linux kernel.
aocl program
aocl flash
aocl diagnose
Without an argument, the utility returns the overall information of all the devices installed in a host machine. If a specific device name is provided as an argument (that is, aocl diagnose <device_name> ), the diagnose utility runs a memory transfer test and then reports the host-device transfer performance.
You can run the diagnose utility for multiple devices (that is, aocl diagnose <device_name1> <device_name2> <device_name3> ). If you want to run the diagnose utility for all devices, use the all option (that is aocl diagnose all).
Possible Errors After Running the diagnose Utility
Memory Module Not Plugged-in or a Loose Connection on the Board
aocl diagnose: Running diagnose from aocl diagnose: failed 32 times. First error below: Vendor: Intel Corporation MMD INFO: [acla10_ref0] uniphy(s) did not calibrate. Expected 0 but read 2 MMD INFO: If there are more failures than Uniphy controllers connected, MMD INFO: ensure the uniphy_status core is correctly parameterized.
Solution
Confirm that you have connected the memory board, power cable, and USB cable correctly as shown in Configuring and installing the Intel Arria 10 GX FPGA Development Kit board. If you have confirmed your connections and continue to get this error, the memory board might not be seated correctly in the HiLo connector.
Error While Loading Shared Libraries
When you execute diagnose all utility, you might see errors as shown in the following example:
$ aocl diagnose all hld/board/a10_ref/linux64/libexec/diagnose: error while loading shared libraries: libaltera_a10_ref_mmd.so: cannot open shared object file: No such file or directory
Solution
Ensure that you have set the PATH environment variable correctly.
export PATH=$PATH:$INTELFPGAOCLSDKROOT/bin:$QUARTUS_DIR/bin
aocl list-devices
The list-devices utility is similar to the diagnose utility. It first verifies the installation of the kernel driver and then lists all the devices.
Intel Arria 10 FPGA Development Kit Reference Platform Scripts
Script | Description |
---|---|
base_write_sdc.tcl | The post_flow_pr.tcl script runs this script during the base revision compilation. The base_write_sdc.tcl script then exports all the SDC constraints to the base.sdc file, which is part of the board directory. |
create_fpga_bin_pr.tcl | Creates the ELF binary file, fpga.bin, from the .sof file, the .rbf file, and the pr_base.id file. |
post_flow_pr.tcl | This script runs after every Intel® Quartus® Prime Pro Edition software compilation. It facilitates the guaranteed timing flow by setting the kernel clock PLL, generating a small report in the acl_quartus_report.txt file, and rerunning STA with the modified kernel clock settings. |
pre_flow_pr.tcl | This script generates the RTL of the top-level board.qsys Platform Designer system for the static region and the kernel_system.qsys Platform Designer system for the kernel PR region. |
qar_ip_files.tcl | Tcl script that packages up all IP files in the base revision compile to create base.qar. |
create_acds_ver_hex.tcl | Tcl script that creates quartus_version.id and acds_version_rom.mif files. PR can
be done only if the programmed AOCX was compiled from the same
Quartus version as AOCX being programmed. quartus_version.id is stored in AOCX so that at runtime, BSP can determine Quartus version of AOCX being programmed. acds_version_rom.mif is used during compilation, to update the contents of on-chip memory in the SOF file. |
regenerate_cache.tcl | Tcl script that regenerates the BAK cache file in your temporary directory. |
Considerations in Intel Arria 10 GX FPGA Development Kit Reference Platform Implementation
- The quartus_syn executable reads the SDC files. However, it does not support the Tcl command get_current_revision. Therefore, in the top_post.sdc file, a check is in place to determine whether quartus_syn has read the file before checking the current version.
In addition to these workarounds, take into account the following considerations:
- Intel® Quartus® Prime compilation is only ever performed after the Intel® FPGA SDK for OpenCL™ Offline Compiler embeds an OpenCL kernel inside the system.
- Perform Intel® Quartus® Prime compilation after you install the Intel® FPGA SDK for OpenCL™ and set the INTELFPGAOCLSDKROOT environment variable to point to the SDK installation.
- The name of the directory where the Intel® Quartus® Prime project resides must match the name field in the board_spec.xml file within the Custom Platform. The name is case sensitive.
- The PATH or LD_LIBRARY_PATH environment variable must point to the MMD library in the Custom Platform.
Intel FPGA SDK for OpenCL Intel Arria 10 GX FPGA Development Kit Reference Platform Porting Guide Archives
IP Core Version | User Guide |
---|---|
18.1 | Intel FPGA SDK for OpenCL Intel Arria 10 GX FPGA Development Kit Reference Platform Porting Guide |
17.1 | Intel FPGA SDK for OpenCL Intel Arria 10 GX FPGA Development Kit Reference Platform Porting Guide |
17.0 | Intel FPGA SDK for OpenCL Intel Arria 10 GX FPGA Development Kit Reference Platform Porting Guide |
16.1 | Intel FPGA SDK for OpenCL Intel Arria 10 GX FPGA Development Kit Reference Platform Porting Guide |
Document Revision History for the Intel Arria 10 GX FPGA Development Kit Reference Platform Porting Guide
Document Version | Intel® Quartus® Prime Version | Changes |
---|---|---|
2019.04.18 | 19.1 |
|
2019.02.11 | 18.1 |
|
Date | Version | Changes |
---|---|---|
November 2017 | 2017.11.03 |
|
May 2017 | 2017.05.08 | Replaced references to ACL_QSH_COMPILE_CMD with ACL_DEFAULT_FLOW. |
October 2016 | 2016.10.31 |
|
July 2016 | 2016.07.29 |
|
May 2016 | 2016.05.09 |
|
December 2015 | 2015.12.21 | Initial release. |