Intel® High Level Synthesis Accelerator Functional Unit Design Example User Guide

ID 683025
Date 7/19/2019

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents
Give Feedback

2.7. Loading AF Bitstream and Running the Host Application

To run the bitstream, ensure that your host system contains an Intel® FPGA PAC and that you have Acceleration Stack (including OPAE) installed and configured. For details, see Intel Acceleration Stack Quick Start Guide for Intel® PAC with Intel® Arria® 10 GX FPGA.
  1. Start a terminal session and navigate to the root of the project (the hls_afu directory).
  2. Configure your system to use appropriately sized hugepages:
    $ sudo sh -c "echo 20 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages"
  3. Load the AF into the FPGA:
    $ sudo fpgaconf hls_afu.gbs
  4. Navigate to the hls_afu/sw directory.
  5. Build and run the host application (do not specify USE_ASE=1).
    $ make
    $ sudo ./hls_afu_host
The expected output is:
Using Avalon Slave at offset 0x40
No vector size specified. Default to size 64 floats! run ./hls_afu_host <vectorsize> to specify a vector size at runtime.
Using test vector of size 64.
Running Test
AFU DFH REG = 1000010000000000
AFU ID LO = 944028430b016f3d
AFU ID HI = 5fa7fd4b867c484c
AFU NEXT = 00000000
AFU RESERVED = 00000000
end of output memory before executing kernel:
    [62] - -6259853398707798016.000000 (0xdeadbeef)
    [63] - -6259853398707798016.000000 (0xdeadbeef)
    [64] - -6259853398707798016.000000 (0xdeadbeef)
    [65] - 0.000000 (0x0)
Interrupt enabled = 00000000
Interrupt enabled = 00000001
AFU Latency: 0.01600 milliseconds
Poll success. Return = 1
check output memory:
output memory OK!
sum: Expected 715.000000, calculated 715.000000.

The FPGA writes a full 512-bit word (64 bytes) to host memory, so if the size 
of your test vector (in bytes) is not a multiple of 64, the FPGA will 
overwrite some space at the end of output memory. fpgaPrepareBuffer() 
allocates your host memory in a buffer that is a multiple of 64 bytes, so the 
FPGA behavior will not affect your application. You should expect to see a 
single 0xdeadbeef at the end of the output memory if and only if the size of 
your test vector (determined by vector_size, and the datatype) is a multiple 
of 64 bytes (that is, if vector_size is a multiple of 16). 

end of output memory after executing kernel:
    [62] - 22.333334 (0x41b2aaab)
    [63] - 22.666666 (0x41b55555)
    [64] - -6259853398707798016.000000 (0xdeadbeef)
    [65] - 0.000000 (0x0)
Vector size is 64 (256 bytes), so expect memory output at [64] = 0xdeadbeef
Finished Running Test.