Developer Guide

Intel oneAPI FPGA Handbook

ID 785441
Date 2/07/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

The device_global Extension (Beta)

The device_global class introduces device-scoped memory allocations into SYCL that can be accessed within a kernel using C++ global variable syntax. These memory allocations have unique instances per SYCL device. Like global variables, the device_global class is declared at a namespace scope and visible to all kernels within that scope.

NOTICE:

In the current oneAPI release, the device_global class supports only a few features on FPGAs that are described in this topic. Subsequent releases might include more features of this class. For detailed information about the device_global API, refer to the SYCL device_global Language Specification.

The device_global class is a class template that is parameterized by the underlying allocation type and a list of optional properties (see Table 1). The type of the allocation also encodes the size of the allocation for potentially multidimensional array types.

Declaring a device_global Object

The following is an example of the device_global object declaration that consists of an int array annotated with device_image_scope and host_access_none properties:

#include <sycl/sycl.hpp> 

namespace exp = sycl::ext::oneapi::experimental; 

using FPGAProperties = decltype(exp::properties( 
  exp::device_image_scope, exp::host_access_none));  
exp::device_global<int[10], FPGAProperties> val; 

In the above example, the device_image_scope property limits the lifetime of the val variable. The host_access_none property asserts that the host code does not copy to or from the device_global. These properties are further described in Table 1.

NOTE:

The current implementation does not support the device_global declaration without device_image_scope and host_access_none properties.

Accessing a device_global Object

The following is an example of accessing a device_global object:

int main () {
  sycl::queue q;
  q.single_task([=] {
    val[0] = 42;
    ...
   }).wait();

  int val_0;
  q.copy(val, &val_0, 1 /*count*/, 0 /*startIndex*/).wait();
} 

In most cases, you can directly access the underlying data through overloads of common operators. Host code needs to enqueue copy operations onto the queue to access the variable.

Properties of the device_global Class

The following table describes several compile-time-constant properties that the device_global class supports:

Properties of the device_global Class
Property Description
device_image_scope

Mandatory property for FPGAs.

If you set this property, the device_global memory lifetime begins when you program the FPGA binary image on an FPGA and ends when you program a different FPGA binary image on that FPGA.

host_access

Dictates how the host code accesses the device_global class and enables compiler optimizations. The following values are supported:

  • none: Asserts that the host code never copies to or from the variable. For an FPGA device, no external ports are exposed.
  • read: The user asserts that the host code may copy from (read) the variable, but it will never copy to (write) it. For an FPGA device, only a read port is exposed on the device image.
  • write: The user asserts that the host code may copy to (write) the variable, but it never copies from (read) it. For an FPGA device, only a write port is exposed on the device image.
  • read_write: The user provides no assertions, and the host code may either copy to (write) or copy from (read) the variable. This is the default if the property is omitted. For an FPGA device, a read/write port is exposed on the device image.

FPGA acceleration:
Only none is supported for multiarchitecture binaries

SYCL HLS:
All values are supported.

NOTE:

For additional information, refer to the FPGA tutorial sample "Device Global" listed in the Intel® oneAPI Samples Browser on Linux* or Windows*, or access the code sample on GitHub.

Hardware Implementation

Global variables are not captured implicitly by the lambda, so the compiler need not pass device_global variables as kernel arguments to the generated IP core.

A device_global variable with a device_image_scope property is implemented in on-chip memory, which carries all the on-chip memory advantages, as described in Perform Kernel Computations Using Local or Private Memory.

If you use a compiler-generated IP core in your hardware system, the values stored in a device_global variable can be reset to their initial values (reinitialized) only by reprogramming the device image on to the FPGA.

Host Access

Each device_global variable that allows for host (external) access exposes a dedicated Avalon® memory-mapped interface at the RTL module boundary. Accesses from this interface are not stallable and are also given priority over any arbitrated with any stallable kernel (internal) accesses. Addressing for this variable starts at 0x0 and goes up to the size of the variable.

FPGA acceleration:
If a BSP does not support dedicated Avalon® memory-mapped interfaces for accessing device_global variables then host access on that BSP is not supported.