Intel® FPGA SDK for OpenCL™: Stratix® V Network Reference Platform Porting Guide

ID 683645
Date 11/06/2017
Document Table of Contents

3.10.2. Flash

CvP programming reprograms the FPGA core quickly, but it cannot replace the periphery configuration. When a new design uses a different board variant within a Custom Platform, changes to the periphery are necessary to reflect differences in hardware resources such as the number of memory controllers. Periphery changes require full-device reprogramming using hardware external to the FPGA. Full-device programming through the Flash memory is commonly used to store power-on FPGA configuration images. With this technique, the host system programs the Flash memory across PCIe® using custom bridge IP on the FPGA. An FPGA reprogramming operation from Flash is then carried out, followed by PCIe link restoration with the newly programmed FPGA.

The information below describes the implementation of Flash programming for the Stratix® V Network Reference Platform. If your board offers alternative means of FPGA periphery reconfiguration, Flash programming is unnecessary.

Remember: Flash memory is one of many possible techniques to program the FPGA periphery. It is a board-specific choice. Flash programming depends on board-specific communication link between the host and Flash memory. It also relies on the ability to command FPGA reprogram operation from Flash in live system.

Alternative FPGA periphery programming methods, preferably accessible from the PCIe bus, can be built into a board. You can use external cables to program the FPGA periphery with an external device such as the Intel® FPGA Download Cable (either separate or integrated onto the board). However, cables are points of failure that do not scale well to large deployments.

Attention: Intel® does not recommend external cabling as a solution for periphery programming.

Periphery Hashing and Hash ROM

The Custom Platform must have the necessary infrastructure to select at runtime between FPGA reconfiguration via CvP programming (core replacement only) or via Flash programming (periphery and core replacement). Flash programming is slower of the two programming methods. It is unsafe to first attempt CvP programming and then Flash programming (if CvP programming fails) because Flash programming requires PCIe communication with the FPGA. A CvP programming failure because of mismatched peripheries renders the PCIe link unusable. It eliminates the communication link necessary to program the Flash device. In that failure mode, a system power cycle is necessary to restore the FPGA and the PCIe link.

S5_net includes two infrastructure components to enable runtime decision making on the FPGA configuration method:

  1. ROM storage in the locked-down portion of the FPGA, which contains a hash of the currently programmed periphery configuration. The MMD software layer can read this ROM via PCIe.
  2. Hash of the FPGA periphery bitstream, which is created at the end of the Intel® Quartus® Prime compilation flow. This hash is embedded in two locations:
    1. The fpga.bin file, embedded in the .aocx Intel® FPGA SDK for OpenCL™ Offline Compiler executable file.
    2. The FPGA configuration bitstream, so that the correct hash populates the ROM.

By comparing the hash of the periphery currently programmed to the FPGA against that of the new design, the following function in the acl_pcie_flash.cpp MMD file decides whether CvP programming is sufficient. If not, a fall-back method such as Flash programming is necessary to reprogram the device periphery:

Note: S5_net uses a SHA-1 hash function.

Populating the periphery hash within the FPGA configuration bitstream changes the periphery hash value derived from the bitstream.

To avoid this update loop, the Intel® Quartus® Prime compilation uses a ROM initialization value of all zeros. The output from this compilation goes into the computation of the periphery hash. Then, when you run quartus_cdb --update_mif , it replaces the ROM value with the output hash from the original compilation.

Communicating with Flash Memory over PCIe

In s5_net, the host communicates with Flash memory through a CPLD that connects to a set of FPGA pins. The CPLD is located between the FPGA and Flash, and it masters communication between the FPGA and off-chip peripherals. A CPLD_bridge IP block is instantiated in the locked-down interface portion of the FPGA design. It provides memory-mapped communication between the PCIe controller on the FPGA and the external CPLD communication bus.

The CPLD uses a custom packet-based communication protocol for communication with the FPGA. The MMD host code creates the necessary packets, and transmits them over PCIe to the CPLD_bridge on the FPGA. The bridge in turn communicates the packets to the CPLD for further processing and routing. Flash programming commands are embedded within these packets.

Flash Memory Programming

A full programming bitstream is stored in the Flash memory. The operations necessary for programming are specific to the Flash chip. For more information on the configuration protocol, refer to the source code of the acl_pcie_flash.cpp MMD file and the Flash device datasheet.

The high-level Flash memory programming tasks are as follows:

  1. Erase the Flash lines that you want to program.
  2. Program the data lines.
  3. Read back the data to verify that the programming bitstream is correct.

In s5_net, raw binary file (RBF) bitstreams are programmed to the Flash memory because the configuration hardware on the board expects the .rbf file format. Alternative file formats might be necessary on boards with different configuration methods. You can use Intel® Quartus® Prime software utilities, such as quartus_cdb, to perform file format conversion using the post-flow scripts (scripts/post_flow.tcl and subscripts).


For simplicity, the FPGA bitstreams are not compressed in s5_net.

S5_net verifies the successful programming of the Flash memory (that is, no bit errors). In a production environment where programming speed is of concern, you can take multiple steps to reduce the Flash programming time. For example, you can use compressed bitstreams, reduce or eliminate the verification of Flash contents, and remove multiple busy wait loops in the Flash programming code. Because s5_net is intended as an instructional proof of concept, it is not optimized for programming time.

If the device is configured with a compressed bitstream, then CvP must also use a compressed bitstream.

Base and CvP Revisions for Flash Programming

The Intel® Quartus® Prime compilation of an OpenCL™ kernel can produce two different compilation revisions: base and CvP. For more information on these revisions, refer to the CvP section.

S5_net uses Flash programming for two purposes:

  1. Modification of the FPGA periphery configuration.
  2. Replacement of the power-on Flash configuration image (using the Intel® FPGA SDK for OpenCL™ flash utility).

Modifying the FPGA periphery requires a bitstream from a CvP revision compilation. Replacing the power-on image requires an RBF from a base revision compilation to guarantee CvP reliability. RBF bitstreams are large. To avoid storing both the base and CvP revision compilation RBF files for every design, only include the base revision RBF in the fpga.bin file. As a result, you can use the SDK flash utility to replace the power-on bitstream with the RBF in the fpga.bin file. However, periphery replacement in a live system becomes more complicated. The base revision compilation contains the correct periphery for an SDK user's design, but it does not contain the design itself. The design is only available in a CvP revision compilation. The solution is to replace the periphery through Flash programming using the base revision compilation, and then to immediately CvP program the user's design on top of that periphery. The result is identical to programming a full RBF bitstream from the user's CvP revision compilation, but without storing that RBF.

FPGA Reprogramming from Flash

After you program the Flash memory with the new configuration bitstream from a base revision compilation, you must reconfigure the FPGA in the live system by performing the following tasks:

  1. Reprogram the FPGA from the bitstream in Flash memory.
  2. Wait for device programming to complete.
  3. Restore PCIe link and verify communication with the FPGA.
  4. Program the SDK user design core onto the FPGA via CvP. Refer to the CvP section for more information.

S5_net performs FPGA reprogramming from Flash via a control command to the CPLD by the host, through the bridge on the FPGA. Details of the command are specific to the board hardware and are different across manufacturers.

Flash programming restores the PCIe link in the same way as programming via quartus_pgm. The PCIe configuration space is saved on the host before reprogramming. After reprogramming, the registers are restored by copying the original configuration data from the host to the device configuration space across PCIe. PCIe advanced error reporting (AER) is disabled during the programming operation because the FPGA effectively disappears from the PCIe bus during programming, which is typically a fatal PCIe event (Basic Input/Output System (BIOS) often halts the CPU). By restoring the PCIe configuration space registers after reprogramming, the device can communicate with the host using the same configuration as the original power-on PCIe enumeration.

From the host computer's perspective, the FPGA PCIe endpoint remains unchanged. After PCIe communicates with the FPGA that has the new and verified periphery configuration, CvP programming populates the FPGA core with the OpenCL kernel design. Refer to the CvP section for more details.


S5_net targets a board with Flash memory that stores the power-on FPGA configuration bitstream.

When changing the periphery through Flash programming at runtime, to avoid overwriting the power-on bitstream, you may use a different region of Flash memory as the intermediate storage location. However, this technique requires a means to specify the Flash memory address from which the FPGA will be reprogrammed. For boards without the ability to load from multiple Flash regions dynamically, you might need to overwrite the power-on programming bitstream.