SYCL* originated as a high-level programming model for the OpenCL™ framework. Initially, the SYCL community was a subgroup within the Khronos OpenCL working group, expanding heterogeneous computing to a completely standard C++ based framework. The development path of OpenCL and SYCL specifications has been tightly coupled. With the evolution and work on SYCL 2020, it was, however, eventually decided to form an independent SYCL working group under Khronos, intending to enable broader backend support beyond OpenCL, including that for NVIDIA* and AMD* GPUs. Despite this separation, OpenCL remains one of the preferred SYCL backends. The two working groups continue to collaborate closely, with the OpenCL community committed to maintaining strong support for SYCL.
At the 13th International Workshop on OpenCL and SYCL (IWOCL 2025), Ben Ashbaugh from Intel and the OpenCL Working Group delivered a comprehensive overview focused on what it takes to support SYCL, particularly from the perspective of vendors who already have an OpenCL driver. The conference also hosted an interactive panel discussion, where subject matter experts and dynamic panelists fostered a collaborative and open dialogue on the challenges, trends, and potential improvements in the OpenCL and SYCL frameworks in the context of heterogeneous computing. This article will give you key insights from both sessions.
→ Check out the complete recording and presentation slides of the OpenCL Working Group Update session.
→ Full recording of the OpenCL and SYCL Panel Discussion is available here.
Enabling SYCL for OpenCL Workloads
Enabling SYCL support frequently requires minimal changes for vendors with an existing OpenCL implementation. A basic OpenCL 3.0 implementation is often sufficient, as SYCL 2020 maintains compatibility with a wide range of hardware. However, practical deployment may require support for optional OpenCL 3.0 features, particularly on the device side, due to the demands of C++-based SYCL kernels.
Compilation Flow and SPIR-V
From [00:05:30] in the recording, Ben goes over the SYCL code compilation flow, followed by details about the two leading compilers with integrated SYCL support: the Intel® oneAPI DPC++/C++ Compiler and the AdaptiveCpp compiler.
SYCL implementations typically compile device code to SPIR-V, the Khronos intermediate representation that catalyzes parallel compute and GPU-based graphics through a common low-level abstraction layer. Supporting SPIR-V, especially the kernel (compute) variant, is the most straightforward path for OpenCL vendors. Tools like the SPIR-V LLVM translator can simplify integration for LLVM-based toolchains.
Alternatively, vendors can support ahead-of-time compilation of device-specific binaries, though this may require additional integration steps.
Practical Requirements for SYCL Compilation
To support SYCL effectively, OpenCL implementations should consider the following:
-
Generic Address Space: Required for unqualified pointers in C++.
-
Subgroups: Widely used in SYCL kernels; trivial implementations are possible but not recommended.
-
Optional Features: Work-group scans, reductions, and program-scope variables enhance compatibility but are not strictly required. These features are broadly useful beyond SYCL and align with general OpenCL development goals.
The host-side requirements for enabling SYCL on OpenCL implementations are minimal. A conformant OpenCL 3.0 implementation typically suffices, though some SYCL features may require additional support.
More details in the session recording below:
OpenCL Extensions for Broader SYCL Support
Two major extensions are under development to enhance SYCL compatibility:
- Unified Shared Memory (USM)
USM enables pointer-based memory management in SYCL. While the Shared Virtual Memory (SVM) feature introduced with OpenCL 2.0 was an early attempt, Intel’s OpenCL USM extension has proven more practical. A new Khronos extension aims to standardize USM support, ensuring compatibility with SYCL 2020 and existing Intel implementations. Its key features include:
-
Platform and device-level capability queries
-
Flexible allocation APIs
-
Introspection and extensibility for future enhancements
- Graphs and Command Buffers
Command buffers address performance bottlenecks caused by excessive API calls in SYCL applications. By grouping kernel submissions into a single command buffer, execution becomes more efficient.
Two extensions are being developed:
-
Static Command Buffers: Immutable graphs that can be submitted multiple times.
-
Mutable Dispatch: Allows modification of kernel arguments and execution parameters without rebuilding the entire buffer.
Additional features under consideration include:
-
Mutable memory commands
-
Host access to command buffers
-
Nested command buffer submissions
These extensions are publicly available in provisional form, and feedback from the developer community is encouraged.
→ Check out the session recording from [00:14:00] and [00:20:10] to learn more about Unified Shared Memory and command buffers.
OpenCL and SYCL: Stronger Together
Modern embedded devices are now capable of running large language models (LLMs) locally. This shift is driven by the maturity of AI models and the need for cost-effective, low-latency inference on devices such as IoT sensors and automotive platforms. During the OpenCL and SYCL panel discussion, the panelists discuss the evolving role of OpenCL and SYCL in embedded and high-performance computing. In the second part of this article, we highlight the key topics discussed, covering the challenges faced by the OpenCL and the SYCL ecosystems, and the active, highly collaborative efforts by both communities to tackle the same.
Watch the panel discussion recording:
Hardware-Software Co-Design and Data Format Standardization
While embedded GPUs share architectural similarities with HPC GPUs, they often operate in system-on-chip (SoC) environments where CPU and GPU share memory. This necessitates efficient resource utilization and memory access coordination for concurrent execution. The panel discussion touches on the importance of standardizing AI-specific data formats (e.g., MXFP4, BF16) to ensure cross-vendor compatibility. However, the long hardware development cycles mean software must often lead innovation, with hardware catching up once formats prove their value.
Precision, Testing, and Conformance
The panelists believe the HPC community must move beyond defaulting to the FP64 data type and adopt more precision-aware programming practices. They also discussed the need for robust conformance testing for new data types. Traditional bitwise reproducibility is increasingly impractical for AI workloads, where convergence and statistical correctness are more relevant. Panelists suggest rethinking conformance strategies, especially those for matrix engines and AI accelerators, where deterministic behavior is less critical than overall model performance.
Tooling and Debugging Challenges
As identified in the session from [00:22:25], a major pain point is the lack of mature debugging and profiling tools, particularly for newer vendors. While several key industry players offer robust tools (for example, the Intel® VTune™ Profiler powered by oneAPI), others are still catching up. The panel calls for standardized tooling APIs and better integration of open-source tools to improve the developer experience. The variability in feature support across implementations was also highlighted as a barrier to portability and reliability.
Deployment and Distribution of SYCL Applications
Deploying SYCL applications presents unique challenges compared to OpenCL. While SYCL allows ahead-of-time compilation to intermediate representations like SPIR-V, it lacks a standardized ICD (Installable Client Driver) model. This complicates runtime library management and increases the risk of dependency failures. Panelists discuss the need for better packaging strategies, and some suggest that static linking may offer a more reliable path forward.
Vendor Support and Ecosystem Maturity
From [00:40:40] in the video recording, the discussion turns to the role of hardware vendors in supporting SYCL. Qualcomm*, for example, prioritizes OpenCL due to its low-level optimization capabilities, especially in mobile and embedded contexts. While SYCL is being evaluated for less performance-critical applications, full support would require significant engineering investment. The panelists debate whether silicon vendors must provide their own SYCL compilers or simply ensure compatibility with existing implementations. The consensus is that, regardless, vendor commitment and active contribution to the lower-level runtime interfaces are crucial for long-term support and user confidence.
Community vs. Commercial Support
Afterwards, the speakers get into a nuanced discussion about the distinction between community-driven and commercial support. While community projects like LLVM are robust due to broad industry backing, users often prefer vendor-supported solutions for reliability and accountability. The panel encourages users to advocate for SYCL support through formal channels, emphasizing that user demand can influence vendor priorities.
OpenCL as a SYCL Backend
OpenCL, with its applicability for many different accelerator hardware architectures, remains a viable and important backend for SYCL, particularly with initiatives like Rusticl (a Rust-based OpenCL runtime developed as part of the Mesa 3D Graphics Library) aiming to bring OpenCL support to all Linux* systems. However, real-world issues with ICD loaders and driver conflicts persist. Panelists acknowledge these challenges but express optimism that ongoing improvements in the OpenCL ecosystem will further enhance its role as a SYCL backend.
What’s Next?
Check out the full recordings of the OpenCL Working Group Update session and OpenCL and SYCL Panel Discussion to dive deeper into the perspective of the ecosystem's tech experts about OpenSYCL, SYCL support for OpenCL, and the frameworks’ contributions to high-performance computing.
We encourage you to explore Intel® tools for OpenCL software and our oneAPI implementation of SYCL. Get started with our SYCL-powered and/or SYCL-supported oneAPI tools, libraries, and framework optimizations for accelerated AI, HPC, rendering, and more. You can get them as part of our developer toolkits or download their stand-alone versions.
Useful Resources