Intel® High Level Synthesis Compiler Pro Edition: Getting Started Guide

ID 683680
Date 4/01/2024
Public
Document Table of Contents

2. High Level Synthesis (HLS) Design Examples and Tutorials

The Intel® High Level Synthesis (HLS) Compiler Pro Edition includes design examples and tutorials to provide you with example components and demonstrate ways to model or code your components to get the best results from the Intel® HLS Compiler for your design.

High Level Synthesis Design Examples

The high level synthesis (HLS) design examples give you a quick way to see how various algorithms can be effectively implemented to get the best results from the Intel® HLS Compiler.

You can find the HLS design examples in the following location:
<quartus_installdir>/hls/examples/<design_example_name>

Where <quartus_installdir> is the directory where you installed the Quartus® Prime Design Suite. For example, /home/<username>/intelFPGA_pro/24.1 or C:\intelFPGA_pro\24.1 .

For instructions on running the examples, see the following sections:
Table 2.  HLS design examples
Focus area Name Description
Linear algebra QRD Uses the Modified Gram-Schmidt algorithm for QR factorization of a matrix.

This design is referenced by the Advanced QRD Optimization with Intel® HLS Compiler Intel FPGA white paper . The white paper is available for download at the following URL: https://cdrdv2.intel.com/v1/dl/getContent/655469.

Signal processing interp_decim_filter Implements a simple and efficient interpolation/decimation filter.
Simple design counter Implements a simple and efficient 32-bit counter component.
Video processing YUV2RGB Implements a basic YUV422 to RGB888 color space conversion.
Video processing image_downsample Implements an image downsampling algorithm to scale an image to a smaller size using bilinear interpolation.

HLS Design Tutorials

The HLS design tutorials show you important HLS-specific programming concepts as well demonstrating good coding practices.

Each tutorial has a README file that gives you details about what the tutorial covers and instructions on how to run the tutorial.

Table 3.  Arbitrary precision datatypes design tutorials
Name Description
You can find these tutorials in the following location on your Quartus® Prime system:
<quartus_installdir>/hls/examples/tutorials/ac_datatypes
ac_fixed_constructor Demonstrates the use of the ac_fixed constructor where you can get a better QoR by using minor variations in coding style.
ac_fixed_math_library Demonstrates the use of the Intel® HLS Compiler ac_fixed_math fixed point math library functions.
ac_int_basic_ops Demonstrates the operators available for the ac_int class.
ac_int_overflow Demonstrates the usage of the DEBUG_AC_INT_WARNING and DEBUG_AC_INT_ERROR keywords to help detect overflow during emulation runtime.
You can find these tutorials in the following location on your Quartus® Prime system:
<quartus_installdir>/hls/examples/tutorials/hls_float
1_reduced_double Demonstrates how your application can benefit from hls_float by changing the underlining type from double to hls_float<11, 44> (reduced double).
2_explicit_arithmetic Demonstrates how to use the explicit versions of hls_float binary operators to perform floating-point arithmetic operations based on your needs.
3_conversions Demonstrates when conversions appear in designs with hls_float types and how to use different conversion modes to generate compile-type constants using various hls_float types.
Table 4.  Component memories design tutorials
Name Description
You can find these tutorials in the following location on your Quartus® Prime system:
<quartus_installdir>/hls/examples/tutorials/component_memories
attributes_on_mm_agent_arg Demonstrates how to apply memory attributes to Avalon® Memory Mapped (MM) agent arguments.
exceptions Demonstrates how to use memory attributes on constants and struct members.
memory_bank_configuration Demonstrates how to control the number of load/store ports of each memory bank and optimize your component area usage, throughput, or both by using one or more of the following memory attributes:
  • hls_max_replicates
  • hls_singlepump
  • hls_doublepump
  • hls_simple_dual_port_memory
  • non_power_of_two_memory
  • non_trivial_initialization
memory_geometry Demonstrates how to split your memory into banks and control the number of load/store ports of each memory bank by using one or more of the following memory attributes:
  • hls_bankwidth
  • hls_numbanks
  • hls_bankbits
memory_implementation Demonstrates how to implement variables or arrays in registers, MLABs, or RAMs by using the following memory attributes:
  • hls_register
  • hls_memory
  • hls_memory_impl
memory_merging Demonstrates how to improve resource utilization by implementing two logical memories as a single physical memory by merging them depth-wise or width-wise with the hls_merge memory attribute.
non_power_of_two_memory Demonstrates how to use the force_pow2_depth memory attribute to control the padding of memories that are non-power-of-two deep, and how that impacts the FPGA memory resource usage.
non_trivial_initialization Demonstrates how to use the C++ keyword constexpr to achieve efficient initialization of read-only variables.
static_var_init Demonstrates how to control the initialization behavior of statics in a component using the hls_init_on_reset or hls_init_on_powerup memory attribute.
Table 5.  Interface design tutorials
Name Description
You can find these tutorials in the following location on your Quartus® Prime system:
<quartus_installdir>/hls/examples/tutorials/interfaces
overview Demonstrates the effects on quality-of-results (QoR) of choosing different component interfaces even when the component algorithm remains the same.
explicit_streams_buffer

Demonstrates how to use explicit stream_in and stream_out interfaces in the component and testbench.

explicit_streams_packets_ empty Demonstrates how to use the usesPackets, usesEmpty, and firstSymbolInHighOrderBits stream template parameters.
explicit_streams_packets_ ready_valid Demonstrates how to use the usesPackets, usesValid, and usesReady stream template parameters.
mm_host_testbench_operators Demonstrates how to invoke a component at different indicies of an Avalon Memory Mapped (MM) Host (mm_host class) interface.
mm_agents Demonstrates how to create Avalon-MM Agent interfaces (agent registers and agent memories).
mm_agents_double_buffering Demonstrates the effect of using the hls_readwrite_mode macro to control how memory hosts access the agent memories
mm_agents_csr_volatile Demonstrates the effect of using volatile keyword to allow concurrent agent memory accesses while your component is running.
multiple_stream_call_sites Demonstrates the benefits of using multiple stream call sites.
pointer_mm_host Demonstrates how to create Avalon-MM Host interfaces and control their parameters.
stable_arguments Demonstrates how to use the stable attribute for unchanging arguments to improve resource utilization.
Table 6.  Best practices design tutorials
Name Description
You can find these tutorials in the following location on your Quartus® Prime system:
<quartus_installdir>/hls/examples/tutorials/best_practices
ac_datatypes Demonstrates the effect of using ac_int datatype instead of int datatype.
control_of_dsp_usage Demonstrates the effects of controlling whether some supported data types and math functions implemented by DSPs or soft logic with the --dsp-mode option of the i++ command and the ihc::math_dsp_control function.
const_global Demonstrates the performance and resource utilization improvements of using const qualified global variables.
divergent_loops Demonstrates a source-level optimization for designs with divergent loops
floating_point_contract Demonstrates how to use the -ffp_contract option to improve the performance of your design for double-precision floating-point operations.
floating_point_ops Demonstrates the impact of -ffp-contract=fast and -ffp-reassociate flags in i++ on floating point operations using a 32-tap finite impulse response (FIR) filter design that is optimized for throughput.
fpga_reg Demonstrates how to use the fpga_reg macro to precisely tune pipelining in your design.
hyper_optimized_handshaking Demonstrates how to use the --hyper-optimized-handshaking option of the Intel HLS Compiler i++ command.
loop_coalesce Demonstrates the performance and resource utilization improvements of using loop_coalesce pragma on nested loops.

While the #pragma loop_coalesce is provided with both Standard and Pro edition, the design tutorial is provided only with Pro edition.

loop_fusion Demonstrates the latency and resource utilization improvements of loop fusion.
loop_memory_dependency Demonstrates breaking loop carried dependencies using the ivdep pragma.
lsu_control Demonstrates the effects of controlling the types of LSUs instantiated for variable-latency Avalon® MM Host interfaces.
parallelize_array_operation Demonstrates how to improve fMAX by correcting a bottleneck that arises when performing operations on an array in a loop.
optimize_ii_using_hls_register Demonstrates how to use the hls_register attribute to reduce loop II and how to use hls_max_concurrency to improve component throughput
parameter_aliasing

Demonstrates the use of the __restrict keyword on component arguments.

random_number_generator Demonstrates how to use the random number generator library.
reduce_exit_fifo_width Demonstrates how to improve fMAX by reducing the width of the FIFO belonging to the exit node of a stall-free cluster
relax_reduction_dependency

Demonstrates a method to reduce the II of a loop that includes a floating point accumulator, or other reduction operation that cannot be computed at high speed in a single clock cycle.

remove_loop_carried_dependency Demonstrates how you can improve loop performance by removing accesses to the same variable across nested loops.
resource_sharing_filter Demonstrates an optimized-for-area variant of a 32-tap finite impulse response (FIR) filter design
set_component_target_fmax_1 Demonstrates how to the target fMAX in various ways by leveraging the Loop Analysis report in the High-Level Design Reports.
set_component_target_fmax_2 Demonstrates how the compiler handles the tradeoff between fMAX and II based on the presence or absence of the hls_scheduler_target_fmax_mhz component attribute and the ii loop pragma.
shift_register Demonstrates the recommended coding style for implementing shift registers.
sincos_func Demonstrates the effects of using sinpi or cospi functions in your component instead of sin or cos functions.
single_vs_double_ precision_math Demonstrates the effect of using single precision literals and functions instead of double precision literals and functions.
stall_enable Demonstrates how to replace stall-free clusters with stall-enabled clusters to improve latency in some small designs.
struct_interface Demonstrates how to use ac_int to implement interfaces with no padding bits.
submnormal_and_rounding Demonstrates the effects of use the --daz and --rounding i++ command options.
swap_vs_copy Demonstrates the impact of using deep copying with registers on the performance and resource utilization of a component design.
triangular_loop Demonstrates a method for describing triangular loop patterns with dependencies.
Table 7.  Usability design tutorials
Name Description
You can find these tutorials in the following location on your Quartus® Prime system:
<quartus_installdir>/hls/examples/tutorials/usability
full-design

Demonstrates a simple sort component in a minimal system as described in the HLS Walkthrough video series that is available through the Intel FPGA YouTube channel.

compiler_interoperability Demonstrates how to build your design using testbench code compiled with the Intel® HLS Compiler, GCC, or Microsoft* Visual Studio* and component code compiled separately with the Intel® HLS Compiler).
enqueue_call Demonstrates how to run components asynchronously and exercise their pipeline performance in the test bench using enqueue functionality.

platform_designer_2xclock

Demonstrates the recommended clock and reset generation for a component with a clock2x input.

platform_designer_stitching

Demonstrates how to combine multiple components to function as a single cohesive design.
Table 8.  System of tasks design tutorials
Name Description
You can find these tutorials in the following location on your Quartus® Prime system:
<quartus_installdir>/hls/examples/tutorials/system_of_tasks
balancing_loop_delay Demonstrates how to improve the throughput of a component that uses a system of tasks by buffering streams.
balancing_pipeline_latency Demonstrates how to improve the throughput of a component that uses a system of tasks by buffering streams.
interfaces_sot Demonstrates how to transfer information between, into, and out of tasks using Avalon® streaming and Avalon® memory-mapped host interfaces.
internal_stream Demonstrates how to use "internal streams" in HLS tasks with the ihc::stream object.
launch_and_collect_capacity Demonstrates how to use the capacity template parameter of the ihc::launch and ihc::collect functions to improve throughput in components that have systems of tasks.
parallel_loop Demonstrates how you can run sequential loops in a pipelined manner by using a system of HLS tasks in your component.
resource_sharing Demonstrates how you can share expensive compute blocks in your component to save area usage.
task_reuse Demonstrates how to invoke multiple copies of the same task function.
Table 9.  HLS Libraries design tutorials
Name Description
You can find these tutorials in the following location on your Quartus® Prime system:
<quartus_installdir>/hls/examples/tutorials/libraries
basic_rtl_library_flow

Demonstrates the process of developing an RTL library and using it in an HLS component.

rtl_struct_mapping Demonstrates how to obtain a mapping from C++ struct fields to bit-slices of RTL module interface signals.
Table 10.  HLS Loop Control tutorials
Name Description
You can find these tutorials in the following location on your Quartus® Prime system:
<quartus_installdir>/hls/examples/tutorials/loop_controls
max_interleaving
Demonstrates a method to reduce the area utilization of a loop that meets the following conditions:
  • The loop has an II > 1
  • The loop is contained in a pipelined loop
  • The loop execution is serialized across the invocations of the pipelined loop
small_speculated_iterations Demonstrates how decreasing the number of speculated iterations improves latency when a loop body has low latency and is expected to be frequently invoked.
speculated_iterations Demonstrates how increasing the number of speculated iterations improves II when the exit condition calculation is the bottleneck preventing a lower II.