Real-World SYCL* Applications Using Intel® Hardware and oneAPI

6/17/2025

Nikita Shiledarbaxi,

Robert Mueller-Albrecht

Software Technical Marketing Engineer, Intel

SYCL*, an open standard programming model developed by the Khronos Group, enables single-source, heterogeneous computing in modern C++. It allows developers to target a variety of hardware architectures, including CPUs, GPUs, FPGAs, and other accelerators using a unified programming framework. Intel is a major contributor to and beneficiary of the SYCL ecosystem. We support SYCL through the oneAPI initiative, which encourages the UXL Foundation’s open, vendor-independent, multiarchitecture, and accelerated parallel programming approach. Our latest accelerated hardware and oneAPI software have proven instrumental in enabling scalable, portable, high-performance SYCL applications. These efforts underscore our leadership in fostering open standards and empowering developers to build the next generation of scientific computing solutions. During the 13th International Workshop on OpenCL and SYCL (IWOCL 2025), tech experts discussed various practical SYCL applications backed by Intel technologies. This article will give you an overview of four of these projects, along with key insights shared during the IWOCL discussions.

FluidX3D: Large-scale Computational Fluid Dynamics Using SYCL

FluidX3D is a high-performance, memory-efficient computational fluid dynamics (CFD) software built using OpenCL™. Originally developed for academic research, FluidX3D now supports simulations exceeding 100 billion grid cells on a single server, largely thanks to innovations in memory optimization and hardware compatibility. Through custom FP16 formats and esoteric memory layouts, FluidX3D reduces memory usage to 55 bytes per grid cell, enabling high-resolution simulations on modest hardware. The software supports domain decomposition across heterogeneous GPUs, including Intel, AMD*, and NVIDIA*, using system RAM for inter-device communication. A custom OpenCL-based rendering engine allows visualization without relying on traditional graphics APIs.

Listen to the recording of the keynote by Dr. Moritz Lehmann from Intel talking at IWOCL this year:

Here’s how FluidX3D benefits from Intel® technologies:

FluidX3D runs seamlessly on Intel® GPUs (e.g., Intel® Arc™ A770, Intel Arc B580) and CPUs, including integrated graphics and high-end data center solutions. The software benefits from OpenCL’s cross-platform support, enabling deployment on Intel hardware without code modifications.

A major milestone was achieved using a dual-socket Intel® Xeon® 6980P server with 6 TB of RAM, enabling a simulation of over 100 billion grid cells. This demonstrated that supercomputer-scale simulations can be performed on a single Intel-based server.

The talk highlights the use of Intel Xeon 6 CPUs and MR-DIMMs (256 GB memory modules), which provide massive memory bandwidth (up to 1.7 TB/s) and capacity (up to 12 TB per server). These features are critical for large-scale CFD workloads.

The keynote speaker collaborated internally to fix a critical issue in the Intel® OpenCL CPU runtime related to AVX-512 vectorization for 64-bit indexing. This fix enabled the successful execution of ultra-large simulations.

The platform supports Intel® Core™ Ultra (Series 1), powerful integrated GPUs that support large memory configurations (up to 128 GB), making them viable for high-resolution simulations.

FluidX3D leverages OpenCL’s zero-copy buffer feature on Intel CPUs to reduce memory duplication and improve efficiency, especially important for large-scale simulations.

The project exemplifies how open standards and Intel’s cutting-edge hardware can democratize access to extreme-scale CFD simulations. It demonstrates that with the right software and hardware—particularly Intel’s scalable CPU and GPU platforms—researchers can achieve supercomputing performance on accessible, cost-effective systems.

→ Check out the IWOCL session’s presentation slides: Scaling up FluidX3D CFD beyond 100 Billion cells on a single computer.

→ Learn more about Intel® tools for OpenCL software.

Shamrock: Hydrodynamics Simulations Powered by oneAPI and SYCL

Shamrock is a high-performance, exascale-ready computational framework designed for astrophysical hydrodynamics simulations. Developed in C++17 with SYCL and MPI, it supports multi-GPU execution and is optimized for large-scale, heterogeneous computing environments. Astrophysical simulations demand extreme precision and scalability due to vast spatial and density ranges. Shamrock addresses this by supporting multiple numerical schemes—finite element, finite volume, and meshless methods like Smoothed Particle Hydrodynamics (SPH) - within a unified, modular framework. The framework abstracts neighbor interactions and update schemes, enabling flexible implementation of diverse solvers.

Key features of the Shamrock code include:

Dynamic Domain Decomposition: Shamrock uses patch-based decomposition with adaptive load balancing to ensure even distribution of computational work across GPUs.

Tree-Based Neighbor Search: A parallelized radix tree algorithm, based on Morton codes, enables efficient neighbor discovery. This is critical for SPH simulations involving billions of particles.

Sparse MPI Communication: The framework implements non-blocking, sparse MPI communication patterns, optimized for large-scale simulations with minimal overhead.

Python Interoperability: Users interact with the simulation through a Python interface, simplifying setup and execution.

Scalability and Performance: Shamrock matches or exceeds the performance of legacy astrophysics codes like Phantom, while offering superior scalability and GPU support.

At an IWOCL tech talk (recording below), Timothée David Cléris from the University of Grenoble discussed how the Shamrock project benefits from SYCL. Learn more:

Here’s how the Shamrock framework leverages SYCL and Intel hardware:

Built using SYCL enables cross-platform compatibility and simplifies kernel scheduling through SYCL’s DAG-based execution model.

The framework is transitioning from SYCL buffers to Unified Shared Memory (USM) for finer memory control and better interoperability with Intel® MPI, BLAS, and FFT libraries. This shift enhances performance and reduces latency, particularly on Intel hardware.

Leveraging Intel® oneAPI Level Zero backend and SYCL’s pointer semantics, Shamrock supports direct GPU memory access for MPI communications. This allows efficient data exchange between GPUs without CPU intervention, significantly improving latency and scalability.

As demonstrated by the speaker from [00:16:50] in the recording, Shamrock achieves substantial speedups on Intel GPUs using the LLVM CUDA backend. For example, simulations on Intel-based systems demonstrated up to 6x performance gains over CPU-only runs.

Benchmarks on Intel hardware, including Intel Arc B580 GPUs and Intel Xeon CPUs, demonstrated strong performance and up to 92% parallel efficiency on large-scale simulations, as shown by the presenter from [00:19:10] in the video recording.

Shamrock exemplifies how modern, open standards like SYCL, supported by Intel’s hardware and software ecosystem, can enable scalable, high-fidelity astrophysical simulations. Its architecture is designed for flexibility, performance, and futureproofing in the exascale era.

→ The presentation slides of the session Shamrock: Exascale Hydrodynamics for Astrophysics Using SYCL are available here.

GROMACS: Molecular Dynamics Simulation Enhanced Using SYCL

GROningen MAchine for Chemical Simulations (GROMACS) is a high-performance, open-source molecular dynamics software package widely used for simulating the behavior of biomolecules such as proteins, lipids, and nucleic acids. Originally developed at the University of Groningen*, an international collaboration now maintains it and supports a wide range of force fields and simulation techniques. It finds its application in academic and industrial research for studying molecular interactions, drug design, and materials science.

At an IWOCL session, Andrey Alekseenko from the KTH Royal Institute of Technology and Ewan Crawford from Codeplay Software shed some light on the benefits that Intel’s SYCL Graph extensions bring to GROMACS. Check out the full recording:

GROMACS uses SYCL Graph extension to reduce kernel launch overhead and improve performance, especially for small, repetitive workloads. The presenter, from [00:16:15] in the session recording, shows 15% performance improvement on Intel Arc B580 GPUs, demonstrating effective GPU utilization. SYCL Graphs help shift the performance bottleneck from CPU-side kernel submission to GPU execution, enabling better overlap and throughput.

GROMACS integrates with Intel® oneAPI Math Kernel Library (oneMKL) allowing FFT operations to be captured within SYCL Graphs without complex workarounds, improving portability and maintainability. The SYCL Graph implementation leverages Intel oneAPI Level Zero backend for low-level GPU control, enabling efficient command submission and native interoperability. Intel is developing a graph update API to allow dynamic changes to kernel parameters without rebuilding the entire graph. This is especially useful in GROMACS, where neighbor lists and kernel configurations change periodically.

→ For more information, refer to the presentation slides of the SYCL-Graph in GROMACS session at IWOCL.

→ Read the blog GROMACS: Simulate and Analyze Proteins, Lipids, and Acids Powered by SYCL.

→ Watch the tech video Intel Tools: Empowering GROMACS Cross-Architecture Development.

Blender: High-Performance Cross-Vendor Supported 3D Rendering with SYCL

Blender is a powerful, open-source 3D content creation suite widely used in animation, visual effects, game development, and scientific visualization. It supports the entire 3D pipeline—including modeling, rigging, animation, simulation, rendering, compositing, and video editing—and is known for its flexibility, extensibility, and active community. Blender includes two rendering engines: Eevee for real-time rendering and Cycles for photorealistic path tracing. It runs on Microsoft Windows*, macOS, and Linux*, and is increasingly used as a benchmark tool for GPU performance. Blender’s open architecture and support for cross-vendor technologies like SYCL make it a leading platform for both creative professionals and technical developers.

The integration of SYCL and Intel oneAPI into Blender brings several key benefits that enhance performance, portability, and maintainability across diverse hardware platforms:

SYCL enables Blender’s Cycles rendering engine to run on Intel, AMD, and NVIDIA GPUs using a unified codebase, reducing the need for vendor-specific backends like CUDA or HIP*.

Blender uses the open-source version of the Intel® oneAPI DPC++/C++ Compiler to comply with its General Public License (GPL) and ensure transparency and reproducibility in builds.

Intel® Embree library, integrated through SYCL, enables hardware accelerated ray tracing on Intel GPUs, significantly improving rendering performance for intersection-heavy workloads.

Blender uses Ahead-of-Time (AOT) compilation with the Intel GPU compiler to precompile SYCL kernels for multiple GPU targets, reducing runtime overhead and ensuring compatibility across systems.

Blender leverages advanced SYCL extensions like Bindless Textures for efficient texture sampling using GPU hardware; Device Globals for optimized memory access and reduced indirection; Level Zero Backend for stable, low-level device interaction and runtime control.

Blender bundles the SYCL runtime and Level Zero loader, ensuring consistent behavior across user systems without requiring external dependencies.

The SYCL backend is stable and actively maintained, with Blender shipping it in official builds and reporting minimal user issues despite the complexity of the rendering pipeline.

For more details, watch the recording of the IWOCL session by Stefan Warner from Intel and Xavier Hallade from ph0b.com:

→ Refer to the presentation slides: 3D Rendering With SYCL Cross-Vendor Support and Performance Using Blender Cycles.

→ Check out project Blender on GitHub.

Build High-Performance Applications sing SYCL

Are you new to the heterogeneous computing approach? Get started with the essentials of SYCL. Our oneAPI programming guide has comprehensive resources for developers to learn about SYCL extensions, how to optimize SYCL applications, and more.

Learn about our oneAPI tools supporting or powered by SYCL, including the Intel oneAPI DPC++/C++ Compiler, the Intel DPC++ Compatibility Tool (and its open-source implementation called SYCLomatic) for automated code migration from CUDA to C++ with SYCL, and others. You can get these tools as part of our developer toolkits or download their standalone versions.

Check out the complete CUDA to SYCL catalog and explore how SYCL benefits various applications in domains such as fluid dynamics, healthcare, math, science, and more.