Developer Guide

FPGA Optimization Guide for Intel® oneAPI Toolkits

ID 767853
Date 12/16/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Annotating Unified Shared Memory Pointers

When using Unified Shared Memory (USM) to perform a host allocation or a device allocation, Intel® recommends annotating raw USM pointers inside the kernel before accessing the pointers using the host_ptr or device_ptr object. The host_ptr and device_ptr objects are instances of the multi_ptr class in SYCL that provides constructors for address space qualified pointers.

Using host_ptr and device_ptr objects allow the compiler to perform better alias analysis, which typically leads to better throughput and smaller silicon area for your design. Also, host or device annotated pointers allow the compiler to infer simpler RTL, because Load-Store Units (LSUs) that want to access the USM pointers must be connected only to the host memory or only to the device memory, respectively. Without the annotations, the compiler is compelled to connect LSUs to both memories because the location of the pointer is unknown at compile time.

For example, when using the malloc_device function to define a pointer Ptr, construct a device_ptr object using the pointer Ptr inside the kernel, and access the device_ptr object directly instead of accessing the pointer Ptr:

T* ptr = malloc_device<T>(1024, Queue);
...
cgh.single_task<class DeviceAnnotation>([=]() {
  Ptr[0] = 42;  // load-store unit connected to both device and host memories
  device_ptr<T> DevicePtr(Ptr);
  DevicePtr[1] = 43;  // load-store unit connected only to the device memory 
});

Similarly, when using the malloc_host function to define a pointer Ptr, construct a host_ptr object using the pointer Ptr inside the kernel and access the host_ptr object directly instead of accessing Ptr:

T* ptr = malloc_host<T>(1024, Queue);
...
cgh.single_task<class HostAnnotation>([=]() {
  Ptr[0] = 42;  // load-store unit connected to both device and host memories
  host_ptr<T> HostPtr(Ptr);
  HostPtr[1] = 43;  // load-store unit connected only to the host memory 
});
CAUTION:

Use annotations consistently and match them with the type of the runtime allocation used. Mismatches and inconsistencies when using the address space annotations are considered undefined behavior and may lead to incorrect results or hardware hangs.