Visible to Intel only — GUID: GUID-193A9DEE-3CD1-4120-AC06-B6AAFB5F1674
FPGA Optimization Guide for Intel® oneAPI Toolkits
Introduction To FPGA Design Concepts
Analyze Your Design
Optimize Your Design
FPGA Optimization Flags, Attributes, Pragmas, and Extensions
Quick Reference
Additional Information
Document Revision History for the FPGA Optimization Guide for Intel® oneAPI Toolkits
Refactor the Loop-Carried Data Dependency
Relax Loop-Carried Dependency
Transfer Loop-Carried Dependency to Local Memory
Minimize the Memory Dependencies for Loop Pipelining
Unroll Loops
Fuse Loops to Reduce Overhead and Improve Performance
Optimize Loops With Loop Speculation
Remove Loop Bottlenecks
Shannonization to Improve FMAX/II
Optimize Inner Loop Throughput
Improve Loop Performance by Caching On-Chip Memory
Global Memory Bandwidth Use Calculation
Manual Partition of Global Memory
Partitioning Buffers Across Different Memory Types (Heterogeneous Memory)
Partitioning Buffers Across Memory Channels of the Same Memory Type
Ignoring Dependencies Between Accessor Arguments
Contiguous Memory Accesses
Static Memory Coalescing
Specify Schedule FMAX Target for Kernels (-Xsclock=<clock target>)
Create a 2xclock Interface (-Xsuse-2xclock)
Disable Burst-Interleaving of Global Memory (-Xsno-interleaving=<global_memory_name>)
Force Ring Interconnect for Global Memory (-Xsglobal-ring)
Force a Single Store Ring to Reduce Area (-Xsforce-single-store-ring)
Force Fewer Read Data Reorder Units to Reduce Area (-Xsnum-reorder)
Disable Hardware Kernel Invocation Queue (-Xsno-hardware-kernel-invocation-queue)
Modify the Handshaking Protocol Between Clusters (-Xshyper-optimized-handshaking)
Disable Automatic Fusion of Loops (-Xsdisable-auto-loop-fusion)
Fuse Adjacent Loops With Unequal Trip Counts (-Xsenable-unequal-tc-fusion)
Pipeline Loops in Non-task Kernels (-Xsauto-pipeline)
Control Semantics of Floating-Point Operations (-fp-model=<value>)
Modify the Rounding Mode of Floating-point Operations (-Xsrounding=<rounding_type>)
Global Control of Exit FIFO Latency of Stall-free Clusters (-Xssfc-exit-fifo-type=<value>)
Enable the Read-Only Cache for Read-Only Accessors (-Xsread-only-cache-size=<N>)
Control Hardware Implementation of the Supported Data Types and Math Operations (-Xsdsp-mode=<option>)
Generate Register Map Wrapper (-Xsregister-map-wrapper-type)
Visible to Intel only — GUID: GUID-193A9DEE-3CD1-4120-AC06-B6AAFB5F1674
Generate Register Map Wrapper (-Xsregister-map-wrapper-type)
ATTENTION:
Only the Intel FPGA IP Authoring flow supports this compiler option.
The Intel® oneAPI DPC++/C++ Compiler generates a ring-like wrapper structure to connect all register map interfaces for different kernels inside an IP core. You can direct the compiler to generate different types of the wrapper by including the -Xsregister-map-wrapper-type=<default|high-fmax|low-latency> option in the icpx command, as shown in the following example:
Example
icpx -fsycl -fintelfpga –Xshardware -Xsregister-map-wrapper-type=<default|high-fmax|low-latency> source_file.cpp
Where:
- -Xsregister-map-wrapper-type=high-fmax: The ring wrapper contains pipeline stages to curtail it from being the fmax bottleneck of the IP core. The number of pipeline stages varies and depends on the number of kernels in the IP core.
- -Xsregister-map-wrapper-type=low-latency: The ring wrapper contains combinational logic only and does not introduce extra latency for the Avalon® Memory-Mapped signals between the IP core boundary and the kernel.
- -Xsregister-map-wrapper-type=default: When you set it to default or omit this compiler option, the compiler automatically infers the ring wrapper type. This compiler option does not change the signals on the register map interfaces in any manner.
CAUTION:
- If you attempt to use this option in the full-system flow, the compiler issues a warning and ignores the option. The compiler still generates the ring wrapper, but the wrapper type used in the full-system flow may differ from the default wrapper type used in the Intel FPGA IP Authoring flow.
- If any kernel in the IP core contains streaming invocation interfaces and register map kernel arguments, and you specify the high-fmax version of the register map wrapper, the compiler returns an error message indicating this combination is not supported.
Parent topic: Optimization Flags