Analyzing CPU and FPGA (Intel® Arria® 10 GX) Interaction
This recipe instructs you how to configure your platform to analyze an interaction of your CPU and FPGA, using Intel® Arria 10 GX FPGA as an example.
Ingredients
This section lists the hardware and software tools used for the performance analysis scenario.
- Application: Matrix Multiplication OpenCL™ application. The Matrix Multiplication sample application is available for download from the Intel® FPGA SDK for OpenCL™ website
- Tools: Intel® FPGA SDK for OpenCL™, Intel® VTune™ Amplifier 2019 or higher
- Starting with the 2020 release, Intel® VTune™ Amplifier has been renamed toIntel® VTune™.Profiler
- Most recipes in theIntel® VTune™Performance Analysis Cookbook are flexible. You can apply them to different versions ofProfilerIntel® VTune™. In some cases, minor adjustments may be required.Profiler
- Get the latest version ofIntel® VTune™:Profiler
- From theIntel® VTune™product page.Profiler
- Download the latest standalone package from the Intel® oneAPI standalone components page.
- Operating System: CentOS* 7, Red Hat* Enterprise Linux 7 or higher
- CPU: Intel® server platform code named Skylake
- FPGA: Intel® Arria® 10 GX
Configure the Intel® Arria® 10 GX FPGA and Intel® FPGA SDK for OpenCL™
- On your Intel Arria 10 GX FPGA, set up the DIP switches and connect the power and USB cables. See detailed instructions.
- DownloadIntel® FPGA SDK for OpenCL™ (includes CodeBuilder, Quartus Prime software and devices)from http://fpgasoftware.intel.com/opencl/.
- Run thesetup_pro.shfile to install the SDK.
- Runsource init_opencl.shto set the appropriate environment variables.
- Runaocl versionto verify the installation. The output should look similar to the following:aocl 17.1.0.240 (Intel(R) FPGA SDK for OpenCL(TM), Version 17.1.0 Build 240, Copyright (C) 2017 Intel Corporation)
- Runaocl installto install the FPGA board.
- Runaocl diagnoseto verify the hardware installation. The output should look similar to the following:Device Name: acl0 Package Pat: /home/tce/intelFPGA_pro/17.1/hld/board/a10_ref Vendor: Intel(R) Corporation Phys Dev Name Status Information acla10_ref0 Passed Arria 10 Reference Platform (acla10_ref0) PCIe dev_id = 2494, bus:slot.func = 44:00.00, Gen3 x4 FPGA temperature = 44.3555 degrees C. DIAGNOSTIC_PASSED
Build the Sample Application and Flash to the FPGA
- Runmakewith the defaultmakefileto build the host executable. The executable output filename ishost.
- Build the binary for the FPGA using the following command:aoc -v -board=a10gx device/matrix_mult.cl -o bin/ matrix_mult.aocx
- Set up the USB driver to flash.
- Run the following command:sudo vim /etc/udev/rules.d/51-usbblaster.rules
- Add the following lines:# usb blaster SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ATTRS{idVendor}=="09fb", ATTRS{idProduct}=="6001", MODE="0666", NAME="bus/usb/$env{BUSNUM}/$env{DEVNUM}", RUN+="/bin/chmod 0666 %c" SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ATTRS{idVendor}=="09fb", ATTRS{idProduct}=="6002", MODE="0666", NAME="bus/usb/$env{BUSNUM}/$env{DEVNUM}", RUN+="/bin/chmod 0666 %c" SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ATTRS{idVendor}=="09fb", ATTRS{idProduct}=="6003", MODE="0666", NAME="bus/usb/$env{BUSNUM}/$env{DEVNUM}", RUN+="/bin/chmod 0666 %c" SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ATTRS{idVendor}=="09fb", ATTRS{idProduct}=="6010", MODE="0666", NAME="bus/usb/$env{BUSNUM}/$env{DEVNUM}", RUN+="/bin/chmod 0666 %c" SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ATTRS{idVendor}=="09fb", ATTRS{idProduct}=="6810", MODE="0666", NAME="bus/usb/$env{BUSNUM}/$env{DEVNUM}", RUN+="/bin/chmod 0666 %c"
- Lower the JTAG clock speed to 6 MHz using the following command:jtagconfig --setparam 1 JtagClock 6M
- Flash the binary to the FPGA using the following command:aocl flash acl0 ./bin/matrix_mult.aocx
- Reboot the host system with the FPGA.
Run CPU/FPGA Interaction Analysis
- Launch the VTune Amplifier. For example:/opt/intel/vtune_amplifier_2019/bin64/amplxe-gui
- Create a project for your analysis, for example:hello_world_opencl.
- ClickConfigure Analysisto start a new analysis.
- Set up theCPU/FPGA Interactionanalysis.
- In theWHEREpane, selectLocal Host.
- In theWHATpane, selectLaunch Applicationand browse to thehello worldapplication. Typically the application can be found under.<sample app>/bin/host
- In theHOWpane, selectCPU/FPGA Interactionfrom the available analysis types.
- ClickStartto begin the analysis.
Interpret Results
After data collection completes, the results are finalized and shown in the
CPU/FPGA Interaction
viewpoint. Start with the
Summary
tab to view the FPGA top compute tasks and well as the top tasks and hotspots for the CPU.

Switch to the
Bottom-up
tab to review the work size of a compute task and data transfer throughput. Use the timeline pane to review the FPGA utilization for compute and transfer.

Use the
Platform
tab to check the computing queue for the FPGA and host application. You can also find the start time and duration of each transfer and synchronization.
