During IWOCL 2025 in Heidelberg, Germany, Duncan Brawley of Codeplay Software Ltd reported on the progress made over the last year, defining and implementing bindless image support with SYCL. This implementation goes beyond the SYCL 2020 definition and simplifies interoperability with multimedia asset render engines like those of DirectX, Vulkan, or Blender*.
Duncan does not just provide a basic progress report on the implementation details and new findings since the first proposed bindless image support in 2024. He focuses on how SYCL* interoperability simplifies the use of advanced rendering with heterogeneous accelerated compute.
⇒ For a recap of the first introduction of this experimental feature, check out the slides and recording of the
“SYCL Bindless Images” presentation at the 12th IWOCL International Workshop on OpenCL* and SYCL in 2024.
Bindless images expand on image support in the SYCL 2020 standard by adding control over how images are stored on a device and which memory access model (e.g., unified shared memory (USM), buffer accessors, device-optimized memory layout, imported memory) is used. This goes far beyond accessor-based memory access requests and supports runtime image rendering, mipmaps, cube maps, image arrays, and more.
Being able to copy and reinterpret image data through flexible auxiliary copy functions rounds out the picture.
In this summary article, we highlight what this means for actual use of these new features for Vulkan and DirectX interoperability, and with varying backends (e.g., Level Zero, CUDA*, or HIP*)
Note: Initial bindless image SYCL backend support code changes have already been pushed to Blender. However, the interoperability features discussed in this article are not yet used.
Vulkan & DirectX12 Memory Use with SYCL
The main motivation for adding Vulkan and DirectX12 interoperability to bindless image support is to minimize the amount of data that has to be transferred back and forth between CPU host processes and accelerator devices.
In short, we are trying to change the execution flow shown below,
Figure 1: Data Transfer between Host and Device without Interoperability
to something simpler and considerably more resource-efficient:
Figure 2: Data Transfer between Host and Device with Interoperability
We eliminate 2 extra copy steps, reusing the same memory allocation within the SYCL and Vulkan or DirectX 12 context.
We achieve that by introducing an API for importing external memory into the SYCL image processing framework.
Figure 3: Basic Process for Importing Memory
Let us have a look at the exact execution flow for DirectX 12 memory in the expandable source code snippet below:
DirectX 12
// Allocate memory in DX12
ComPtr<ID3D12Device> dx12Device = /* … */;
ComPtr<ID3D12Resource> dx12Textture = /* … */;
// Export memory from DX12
HANDLE dx12SharedMemHandle = INVALID_HANDLE_VALUE;
Dx12Device->CreateSharedHandle(dx12Texture.Get(), nullptr,
GENERIC_ALL, nullptr,
&dx12MemHandle);
// Describe memory being imported
syclexp::external_mem_descriptor<syclexp::resource_win32_handle> extDemDesc{
dx12memHandle,
syclexp::external_mem_handle_type::win32_nt_dx12_resource,
dx12TexAllocInfo.SizeInBytes};
// Import memory from Vulkan into SYCL
syclexp::external_mem externMem =
syclexp::import_external_memory(extMemDesc, syclQueue);
// Map imported memory into SYCL memory
syclexp::image_descriptor desc{imgSize, NChannels, channelType};
syclexp::image_mem_handle imgMemHandle =
syclexp::create_image(imgMemhandle, desc, syclQueue);
// Create SYCL image and use as usual
syclexp::unsampled_image_handle =
syclexp::create_image(imgMemhandle, desc, syclQueue);
/* … */
// Destroying external memory objects after use
void release_external_memory(external_mem externalMem,
const sycl::device &syclDevice,
const sycl::context &syclContext);
void release_external_memory(external_mem externalMem,
const sycl::queue &syclQueue):,
Vulkan
// Describe memory being imported
#ifdef _WIN32
syclexp::external_mem_descriptor<syclexp::resource_win32_handle> extDemDesc{
vulkanMemhandle, syclexp::external_mem_handle_type::win32_nt_handle, imgSize};
#else
Syclexp::external_mem_descriptor<syclexp::resource_fd> extMemDesc{
vulkanMemHandle, syclexp::external_mem_handle_type::opaque_fd, imgSize};
#endif
// Import memory from Vulkan into SYCL
syclexp::external_mem externMem =
syclexp::import_external_memory(extMemDesc, syclQueue);
// Map imported memory into SYCL memory
syclexp::image_descriptor desc{imgSize, NChannels, channelType};
syclexp::image_mem_handle imgMemHandle =
syclexp::create_image(imgMemhandle, desc, syclQueue);
// Create SYCL image and use as usual
syclexp::unsampled_image_handle =
syclexp::create_image(imgMemhandle, desc, syclQueue);
/* … */
// Destroying external memory objects after use
void release_external_memory(external_mem externalMem,
const sycl::device &syclDevice,
const sycl::context &syclContext);
void release_external_memory(external_mem externalMem,
const sycl::queue &syclQueue):,
Figure 4: Full Presentation Recording
⇒ Dive into the details of these new interoperability features for bindless texture. Check out the slides and recording of the
“SYCL Interoperability with DirectX and Vulkan via Bindless Images” presentation at the 13th IWOCL International Workshop on OpenCL* and SYCL this past April.
Synchronization and Efficiency with Semaphores
Thus far, we have been discussing accessing and importing memory from DirectX 12 and Vulkan. However, we also need to ensure that there are no inefficiencies in the execution flow due to code waiting for data to be ready to be accessed.
This is where semaphores come in. First of all, the SYCL queue must have the accesses in question lined up in order, so the SYCL kernel and semaphore execution are guaranteed to follow the correct sequence.
Once that is ensured, interoperable memory access timing using semaphores or synchronization primitives becomes quite straightforward.
At [10min 55sec] in his presentation, Duncan details the two types of semaphores supported. Binary semaphores are either triggered or not. Timeline semaphores are reusable and can be assigned a specific 64-bit value that can be used as a wait-for signal.
Below is a list of these semaphores:
Binary Semaphores
-
- opaque_fd
- win32_nt_handle
Timeline Semaphores
-
- win32_nt_dx12_fence
- timeline_fd
- timeline_win32_nt_handle
// Types of external semaphore handles
enum class external_semaphore_handle_type
{
opaque_fd = 0,
win32_nt_handle = 1,
win32_nt_dx12_fence=2,
timeline_fd = 3,
timeline_win32_nt_handle = 4,
};
Importing existing Vulkan or DirectX 12 semaphores into SYCL looks as follows:
Figure 5: Basic process of importing and using semaphores.
Similar to the memory import, beginning at timestamp [13min 10sec], Duncan goes over the detailed workflow of semaphore import using Vulkan and DirectX exportable semaphores.
Note: Semaphores currently cannot be created from scratch in SYCL or other heterogeneous offload programming frameworks like CUDA or Level Zero, only imported.
Simply put, you would create exportable semaphores in Vulkan or DirectX 12 that you can then import using the aforementioned handles. Of course, you would release those external semaphores again after the import is complete.
The Future of SYCL, DirectX 12, and Vulkan Interoperability.
We are planning to have memory and semaphore SYCL interoperability as a separate extension from the SYCL Bindless Images extension, possibly even multiple separate extensions, i.e., separate ones for importing and exporting memory, depending on where the evolution of this feature extension takes us. Stay tuned!
There is much more to be explored, like additional image formats, additional synchronization primitives, and memory export in addition to import.
Let’s Get Started. Take it for a Ride
Check out the implementation details on Intel’s LLVM Source GitHub:
You can browse the implementation source here, with both Level Zero and CUDA backend.
Take it for a test ride with the latest Intel® oneAPI DPC++/C++ Compiler and Intel® DPC++ Compatibility Tool 2025.1 or newer releases.