DSP Builder for Intel® FPGAs (Advanced Blockset): Handbook

ID 683337
Date 7/15/2024
Document Table of Contents

15.3.1. SYCL

The SYCL block is an experimental block in the Beta Blocks library. Use the block to import an IP you create via the Intel oneAPI FPGA Development SYCL* HLS Flow for SYCL* code.

The SYCL block supports only a limited subset of valid IPs and kernels. The restrictions and requirements are:

  • The kernel inputs and output must be plain data types (i.e. not structs, pointers or references).
  • The imported kernel can only have a single pipe.
  • The DSP Builder simulation is cycle by-cycle. DSP Builder uses the latency value, which the compiler reports, to latency correct the outputs from the simulation function. This reported latency is not always correct and you may need to adjust it based on hardware simulation observations.
You must:
  • Write a dummy main function that uses the kernel. If you don't write a main function, the compiler does not generate any hardware for the kernel.
  • Write an exported simulation function for the Simulink simulation. This function must take an array of int64_ts as input via a pointer. The values in the input array are the kernel's data inputs in alphabetical order (the same order that they appear on the block). Although all inputs are stored in int64_ts, the actual data is in the same format as the kernel inputs so you should reinterpret it accordingly. You decide how to implement the simulation function. Two typical implementations are:
    • Create a sycl::queue and then invoke the kernel via pipe read and writes.
    • Call an underlying computation function, which is faster but less flexible. The return type of the function must also be an int64_t. The simulation flow reinterprets it according to the kernel's output type.
  • Provide integer-type inputs to the block in Simulink regardless of the underlying data format (you can use the ReinterpretCast block).
  • Be aware any non-data ports on the block are nonfunctional in the Simulink simulation.
  • To achieve matches between Simulink simulation and hardware simulation, add a minimum delay to the subsystem before the SYCL block to account for reset delay in the IP.
  • Tag the kernel struct with the [[dspba_block_kernel]] attribute and the exported simulation function with the [[dspba_block_function]] attribute.
  • Declare the function and kernel struct on a single line.

Example IP

This example kernel creates a 2D gradient over an NxN image:

#include <sycl.hpp>
#include <sycl/ext/intel/experimental/fpga_kernel_properties.hpp>
#include <sycl/ext/intel/fpga_extensions.hpp>
#include "TestConfigSelector.h"
#include <iostream>

class _my_pipe;
using namespace sycl::ext::intel::experimental;
using namespace sycl::ext::oneapi::experimental;
using my_pipe = sycl::ext::intel::experimental::pipe<
  class _my_pipe, float, 1,

struct [[dspba_block_kernel]] MyIP
  int x, y, n;
  MyIP(int _x, int _y, int _n)
    : x(_x)
    , y(_y)
    , n(_n)
  float compute() const
    float nminus1 = (float)(n - 1);
    float rcp_nminus1 = 1.0f / nminus1;
    float nx = (float)x * rcp_nminus1;
    float ny = (float)y * rcp_nminus1;	
    return (nx + ny) * 0.5f;
  void operator()() const
  auto get(properties_tag)
    return properties{streaming_interface_remove_downstream_stall, pipelined<1>};

extern "C" __declspec(dllexport) int64_t [[dspba_block_function]] MyIPWrapper(int64_t* data)
  int result;
  float fresult = MyIP{(int)data[1], (int)data[2], (int)data[0]}.compute();
  std::memcpy(&result, &fresult, sizeof(float));
  return result;

int main()
    auto selector = sycl::ext::intel::fpga_emulator_selector_v;
    sycl::queue q(selector);
    std::cout << my_pipe::read(q);
    return 0;
  } catch (std::exception& ex){
    std::cout << ex.what();