Programming Guide

Contents

Types of DPC++ FPGA Compilation

The following table summarizes the types of FPGA compilation:
Types of DPC++ FPGA Compilation
Device Image Type
Time to Compile
Description
FPGA Emulator
Seconds
The FPGA device code is compiled to the CPU. Use the Intel® FPGA Emulation Platform for OpenCL™ software to verify your SYCL code's functional correctness.
FPGA Simulator
Minutes
The FPGA device code is compiled to the CPU. Use the Questa*-Intel® FPGA Edition simulator to debug your code.
Optimization Report
Minutes
The FPGA device code is partially compiled for hardware. The compiler generates an optimization report that describes the structures generated on the FPGA, identifies performance bottlenecks, and estimates resource utilization.
FPGA Hardware Image
Hours
Generates the real FPGA bitstream to execute on the target FPGA platform.
A typical FPGA DPC++ development workflow is to iterate in each of these stages, refining the code using the feedback provided by each stage. Intel® recommends relying on emulation and the FPGA optimization report whenever possible.
To compile for FPGA emulation or to generate the FPGA optimization report, you need only the
Intel® oneAPI
DPC++/C++
Compiler
, which is part of the Intel® oneAPI Base Toolkit. However, an FPGA hardware compile requires the Intel® FPGA Add-on for oneAPI Base Toolkit. Refer to the Intel® oneAPI Toolkits Installation Guide for more information about installing this add-on.

FPGA Emulator

The FPGA emulator (Intel® FPGA Emulation Platform for OpenCL™ software) is the fastest method to verify the correctness of your code. It executes the DPC++ device code on the CPU. The emulator is similar to the SYCL host device, but unlike the host device, the FPGA emulator device supports FPGA extensions such as FPGA pipes and
fpga_reg
. For more information, refer to Pipes Extension and Kernel Variables topics in the
Intel® oneAPI DPC++ FPGA Optimization Guide
.
The following are some important caveats to remember when using the FPGA emulator:
  • Performance is not representative
    Never draw inferences about FPGA performance from the FPGA emulator. The FPGA emulator's timing behavior is not correlated to that of the physical FPGA hardware. For example, an optimization that yields a 100x performance improvement on the FPGA may show no impact on the emulator performance. The emulator might show an unrelated increase or decrease.
  • Undefined behavior may differ
    If your code produces different results when compiled for the FPGA emulator versus FPGA hardware, your code most likely exercises undefined behavior. By definition, undefined behavior is not specified by the language specification, and might manifest differently on different targets.
When targeting the FPGA emulator device, use the
-O2
compiler flag to turn on optimizations and speed up the emulation. To turn off optimizations (for example, to facilitate debugging), pass
-O0
.
For detailed information about emulation, refer to Emulate Your Design.

FPGA Simulator

The simulation flow allows you to use the Questa*-Intel® FPGA Edition simulator software to simulate the exact behavior of the synthesized kernel. Like emulation, you can run simulation on a system that does not have a target FPGA board installed. The simulator models a kernel much more accurately than the emulator, but it is much slower than the emulator.
The simulation flow is cycle-accurate and bit-accurate. It exactly models the behavior of a kernel's datapath and the results of operations on floating-point data types. However, simulation cannot accurately model variable-latency memories or other external interfaces. Intel® recommends that you simulate your design with a small input dataset because simulation is much slower than running on FPGA hardware or emulator.
You can use the simulation flow in conjunction with profiling to collect additional information about your design. For more information about profiling, refer to Intel® FPGA Dynamic Profiler for DPC++ in the
Intel® oneAPI DPC++ FPGA Optimization Guide
.
You cannot debug kernel code compiled for simulation using the GNU Project Debugger (GDB)*, Microsoft Visual Studio*, or any normal software debugger.
For more information about the simulation flow, refer to Evaluate Your Kernel Through Simulation (Beta).

FPGA Optimization Report

A full FPGA compilation occurs in the following stages and optimization reports are generated after both stages:
Stages
Description
Optimization Report Information
FPGA early image
(Compilation takes minutes to complete)
The SYCL device code is optimized and converted into an FPGA design specified in the Verilog Register-Transfer Level (RTL) (a low-level, native entry language for FPGAs). The intermediate compilation result is the FPGA early device image that is not an executable.
The optimization report generated at this stage is sometimes referred to as the
static report
.
Contains significant information about how the compiler has transformed your SYCL device code into an FPGA design. The report contains the following information:
  • Visualizations of structures generated on the FPGA
  • Performance and expected performance bottleneck
  • Estimated resource utilization
For information about the FPGA optimization report, refer to the Intel® oneAPI DPC++ FPGA Optimization Guide.
FPGA hardware image
(Compilation takes hours to complete)
The Verilog RTL specifying the design's circuit topology is mapped onto the FPGA's primitive hardware resources by the Intel® Quartus® Prime Software. The Intel® Quartus® Prime Software is included in the Intel® FPGA Add-On for oneAPI Base Toolkit, which is required for this compilation stage. The result is an FPGA hardware binary (also referred to as a bitstream).
Contains precise information about resource utilization and f
max
numbers. For detailed information about how to analyze reports, refer to Analyze your Design section in the
Intel® oneAPI DPC++ FPGA Optimization Guide
.
For information about the FPGA hardware image, refer to the Intel® oneAPI DPC++ FPGA Optimization Guide.

FPGA Hardware

This is a full compile through to the FPGA hardware image. You can target the Intel® Programmable Acceleration Card (PAC) with Intel® Arria® 10 GX FPGA, the Intel® FPGA PAC D5005 (previously known as
Intel® PAC with Intel® Stratix® 10 SX FPGA
), or a custom board.
For more information about using Intel® PAC or custom boards, refer to the FPGA BSPs and Boards section.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.