Types of DPC++ FPGA Compilation
The following table summarizes the types of FPGA compilation:
Device Image Type
| Time to Compile
| Description
|
---|---|---|
FPGA Emulator
| Seconds
| The FPGA device code is compiled to the CPU. Use the Intel® FPGA Emulation Platform for OpenCL™ software to verify your SYCL code's functional correctness.
|
FPGA Simulator | Minutes | The FPGA device code is compiled to the CPU. Use the Questa*-Intel® FPGA Edition simulator to debug your code. |
Optimization Report
| Minutes
| The FPGA device code is partially compiled for hardware. The compiler generates an optimization report that describes the structures generated on the FPGA, identifies performance bottlenecks, and estimates resource utilization.
|
FPGA Hardware Image
| Hours
| Generates the real FPGA bitstream to execute on the target FPGA platform.
|
A typical FPGA DPC++ development workflow is to iterate in each of these stages, refining the code using the feedback provided by each stage. Intel® recommends relying on emulation and the FPGA optimization report whenever possible.
To compile for FPGA emulation or to generate the FPGA optimization report, you need only the
Intel® oneAPI
, which is part of the Intel® oneAPI Base Toolkit. However, an FPGA hardware compile requires the
Intel® FPGA Add-on for oneAPI Base Toolkit. Refer to the
Intel® oneAPI Toolkits Installation Guide for more information about installing this add-on.
DPC++/C++
CompilerFPGA Emulator
The FPGA emulator (Intel® FPGA Emulation Platform for OpenCL™ software) is the fastest method to verify the correctness of your code. It executes the DPC++ device code on the CPU. The emulator is similar to the SYCL host device, but unlike the host device, the FPGA emulator device supports FPGA extensions such as FPGA pipes and
fpga_reg
. For more information, refer to
Pipes Extension and
Kernel Variables topics in the
Intel® oneAPI DPC++ FPGA Optimization Guide
.
The following are some important caveats to remember when using the FPGA emulator:
- Performance is not representativeNever draw inferences about FPGA performance from the FPGA emulator. The FPGA emulator's timing behavior is not correlated to that of the physical FPGA hardware. For example, an optimization that yields a 100x performance improvement on the FPGA may show no impact on the emulator performance. The emulator might show an unrelated increase or decrease.
- Undefined behavior may differIf your code produces different results when compiled for the FPGA emulator versus FPGA hardware, your code most likely exercises undefined behavior. By definition, undefined behavior is not specified by the language specification, and might manifest differently on different targets.
When targeting the FPGA emulator device, use the
-O2
compiler flag to turn on optimizations and speed up the emulation. To turn off optimizations (for example, to facilitate debugging), pass
-O0
.
For detailed information about emulation, refer to
Emulate Your Design.
FPGA Simulator
The simulation flow allows you to use the Questa*-Intel® FPGA Edition simulator software to simulate the exact behavior of the synthesized kernel. Like emulation, you can run simulation on a system that does not have a target FPGA board installed. The simulator models a kernel much more accurately than the emulator, but it is much slower than the emulator.
The simulation flow is cycle-accurate and bit-accurate. It exactly models the behavior of a kernel's datapath and the results of operations on floating-point data types. However, simulation cannot accurately model variable-latency memories or other external interfaces. Intel® recommends that you simulate your design with a small input dataset because simulation is much slower than running on FPGA hardware or emulator.
You can use the simulation flow in conjunction with profiling to collect additional information about your design. For more information about profiling, refer to
Intel® FPGA Dynamic Profiler for DPC++ in the
Intel® oneAPI DPC++ FPGA Optimization Guide
.
You cannot debug kernel code compiled for simulation using the GNU Project Debugger (GDB)*, Microsoft Visual Studio*, or any normal software debugger.
For more information about the simulation flow, refer to
Evaluate Your Kernel Through Simulation (Beta).
FPGA Optimization Report
A full FPGA compilation occurs in the following stages and optimization reports are generated after both stages:
Stages
| Description
| Optimization Report Information
|
---|---|---|
FPGA early image (Compilation takes minutes to complete)
| The SYCL device code is optimized and converted into an FPGA design specified in the
Verilog Register-Transfer Level (RTL) (a low-level, native entry language for FPGAs). The intermediate compilation result is the FPGA early device image that is not an executable.
The optimization report generated at this stage is sometimes referred to as the
static report .
| Contains significant information about how the compiler has transformed your SYCL device code into an FPGA design. The report contains the following information:
For information about the FPGA optimization report, refer to the
Intel® oneAPI DPC++ FPGA Optimization Guide.
|
FPGA hardware image (Compilation takes hours to complete)
| The Verilog RTL specifying the design's circuit topology is mapped onto the FPGA's primitive hardware resources by the
Intel® Quartus® Prime Software. The Intel® Quartus® Prime Software is included in the
Intel® FPGA Add-On for oneAPI Base Toolkit, which is required for this compilation stage. The result is an FPGA hardware binary (also referred to as a bitstream).
| Contains precise information about resource utilization and f max numbers. For detailed information about how to analyze reports, refer to
Analyze your Design section in the
Intel® oneAPI DPC++ FPGA Optimization Guide .
For information about the FPGA hardware image, refer to the
Intel® oneAPI DPC++ FPGA Optimization Guide.
|
FPGA Hardware
This is a full compile through to the FPGA hardware image. You can target the
Intel® Programmable Acceleration Card (PAC) with Intel® Arria® 10 GX FPGA, the
Intel® FPGA PAC D5005 (previously known as
Intel® PAC with Intel® Stratix® 10 SX FPGA
), or a custom board.
For more information about using Intel® PAC or custom boards, refer to the
FPGA BSPs and Boards section.