Brilliant Real-time Rendering with Faster Speed-up on Intel® GPUs

Take advantage of ray-traced hardware acceleration enabled by Intel® Embree 4.2 on Intel® Arc™ & Intel® Data Center GPUs

Get the Latest on All Things CODE



Image: 4004 Moore’s Lane model rendered using an Intel® Embree 4.0-based path tracer on Intel® Arc™ GPU A770.

August 1, 2023With needs for high-powered visual compute, interactive real-time preview and faster rendering growing, we are happy to announce the release of Intel® Embree 4.2 with new support providing ray-traced hardware acceleration on Intel® Arc™ graphics and Intel® Data Center GPUs (Flex and Max Series) delivering significantly faster performance.1

The Intel® Embree 4.2 release advances Intel’s rendering platform by taking advantage of powerful GPU ray-traced hardware acceleration paired with Intel CPUs and using oneAPI’s open, standards-based multiarchitecture programming model. While other GPU-accelerated ray tracing libraries exist, Intel® Embree 4.2 introduces a next-gen innovative solution that delivers scalable performance and productivity—from large-scale rendering on CPUs to immersive interactivity on GPUs through oneAPI and SYCL. At Intel, we’re just getting started ramping our Xe architecture-based GPUs, providing more choice for the industry to build new, flexible, cost-efficient designs for AI, HPC, design and entertainment for visual compute and graphics—like those used in content creation, 3D/visual effects, and scientific visualization.

Figure 1: Intel Embree delivers multiarchitecture performance and productivity on Intel CPU and GPUs.

About Intel Embree - Background

For those unaware of Intel Embree, it is an open source library of ray tracing kernels used by rendering experts and developers to optimize photo-realistic rendering applications and speed up visual compute and production rendering. The library has been a leading solution for some time for CPU rendering. In 2021 the Academy of Motion Picture Arts and Sciences awarded Intel® Embree a Scientific & Technical Achievement Award and previous release versions are integrated into some of the industry’s most popular renderers including Chaos V-Ray*, Dreamworks MoonRay*, Mercenaries Engineering Guerilla*, Maxon Cinema 4D*, and more.

Intel Embree 4.2: New Features Deliver oneAPI Multiarchitecture Performance & Productivity

In the 4.2 release, Intel Embree adds support for Intel’s discrete GPUs using oneAPI’s SYCL* implementation and moves from beta to production. The library’s GPU support builds on the open, standards-based SYCL cross-platform abstraction layer for heterogeneous and offload processors. So in addition to existing CPU support, Intel Embree now enables developers to write C++ code for either CPUs or GPUs independently, however still within the same application codebase. Thus, an updated Embree-optimized application can easily choose between scale and performance. Using a single codebase saves significant time and reduces code maintenance. Rendering experts can gain real-time rendering speed-up by enabling ray-traced hardware acceleration on supported GPUs through Intel Embree.

To demonstrate this, Embree contains an example path tracer, which is a single source renderer that can execute efficiently on the CPU and GPU. For our existing CPU customers using Intel Embree, the following benchmark in Figure 2 shows an Intel® Arc™ A750 Graphics GPU outperforming an Intel® i9-12900K CPU with significantly faster rendering for a path tracing workload.1

Figure 2: Publicly available models (Austrian Imperial Crown modeled by Martin Lubich, and other models from the Stanford 3D Scanning Repository are optimized by Intel Embree 4.2 resulting in significantly faster performance running on an Intel® Arc™ A750 GPU compared to an Intel® i9-12900K CPU for a path tracing workload.1

So for client-side interactive scenarios, content creators, architects, filmmakers and anyone who likes fast rendering may gain much faster performance on an Arc GPU with ray-traced hardware acceleration optimized by Intel Embree 4.2. This is the expected performance boost for interactive rendering or fast preview of scenes with medium complexity. These benefits can be gained by customers switching from CPU rendering to an Intel GPU. When extremely high scene complexity is the goal rather than performance, such as for final film frame, it is easy to switch the same renderer to Intel® Xeon® CPUs or Intel Data Center GPU Max Series where large amounts of memory are available.

Intel Embree 4.2 optimized rendering performance on Intel Arc A750 GPUs (225W thermal design power (TDP)) also compares well against the same public models running on an NVIDIA GeForce RTX 3060* GPU (a GPU at comparable price point and 170W TDP), optimized by NVIDIA OptiX* and Vulkan*.2 This is demonstrated by the following benchmark in Figure 3 using the publicly available ChameleonRT path tracer, which supports Embree, OptiX, and Vulkan backends. Both Embree and OptiX are comparable APIs with a similar feature set building on a modern C++ compute language, and both are commonly used for 3D rendering. While Embree outperforms OptiX in this benchmark, a comparison against Vulkan (commonly used for game development) shows similar performance. We used a larger set of models for our second benchmark to ensure we were comparing performance across a variety of workloads.

Figure 3: All models are available from the Stanford 3D Scanning Repository and renderings done using ChameleonRT v0.0.10. The public models were ported to SYCL with the same code optimized by Intel® Embree 4.2 on an Intel® Arc™ A750 GPU vs. optimized by NVIDIA OptiX* and Vulkan* running on a NVIDIA GeForce RTX* 3060 GPU.2

In addition to migrating the public models to SYCL, Intel Embree 4 supports backwards compatibility to preserve most of the API and existing features from the Embree 3 version, allowing an easy transition. Intel Embree 4.2 provides a rich feature set for optimized production rendering and scientific visualization, including support for curves and point primitives, multi-segment motion blur, and multi-level instancing.

Real-world Example: Blender 3.6 LTS Supports Intel GPU Ray-Traced Hardware Acceleration

We are always excited to work with the industry’s foremost innovators. The latest version of Blender—3.6 LTS supports oneAPI, multiple Intel GPUs and driver improvements, and integrates two Intel rendering libraries (Intel Embree and Intel® Open Image Denoise) to advance its capabilities. The demo below shows shorter rendering times using these capabilities.


Learn more in Unleash GPU Hardware-accelerated Ray Tracing in Blender 3.6 LTS.

Figure 4: Content creators using Blender 3.6 LTS can take advantage of hardware-accelerated ray tracing on Intel Xe-architecture based GPUs through Cycles System Preferences > in the oneAPI tab > select ‘Embree’ as a rendering option.

Download the Library Now

You can find the Intel Embree 4.2 release on GitHub. This new version will be added to the Intel® oneAPI Rendering Toolkit and the oneAPI specification in their future releases. The library ships with various tutorials, including the path tracer used for benchmarking if you want to try it out hands-on. For more details, see the release notes.

Intel Embree 4.2 together with Intel GPUs is another step forward in our commitments to graphics acceleration and promoting an open software stack and ecosystem. We make our technologies broadly available and scalable to bring advanced ray tracing across laptops, workstations, data center/render farm, cloud—and to the world’s largest supercomputers.

We hope you take advantage of these capabilities and look forward to seeing what you can accomplish with ray-traced hardware acceleration. To get started with oneAPI and SYCL, see the next section.

Sven & Laura

Get Started with SYCL: The Productive Path for Single Source Code on CPUs & GPUs

To get full access to Intel Arc or data center GPU ray-traced hardware acceleration, CPU code and CUDA* code would need to be migrated to oneAPI’s implementation in SYCL (Data Parallel C++ / DPC++) to create single source multiarchitecture code to take advantage of the performance and productivity benefits. If the code is originally CPU-only, some design choices may have to be made by developers for the best performance (this will depend on the design of the render). If you are only running your code on CPUs with no offload to GPU and plan to continue so in the future, migration to SYCL is not necessary.

CUDA code migration to SYCL3: You may choose to manually port your code to SYCL to start the code transition. There are also two code migration tools available to assist with CUDA code porting to SYCL, which automates 90% to 95% of this process. The tools are the open source SYCLomatic project and the Intel® DPC++ Compatibility Tool. For the parts of code that do not automatically migrate, the tools provide inline comments help finish writing and tuning code, which will always be necessary per each architecture. There are minimal API changes needed to enable consistent and performant CPU/GPU API. Learn more with these resources:

About the Authors

Sven Woop is a Principal Engineer at Intel and the lead of the award-winning Intel® Embree ray tracing project, which implements highest performance ray traversal algorithm to speed up photorealistic rendering. Sven holds a master's degree in Computer Science and received his PhD on a hardware architecture for real-time ray tracing, both from Saarland University in Germany. His interest and expertise includes computer graphics, hardware design, and parallel programming.

Laura Reznikov is a senior engineering manager responsible for Intel’s award-winning render kernel libraries, including Intel Embree, Intel Open Image Denoise, Intel® Open Volume Kernel Library, and Intel® Open Path Guiding Library. Laura holds a BFA degree in Film and Animation and received her MS in Computer Science, focused on Physically-Based Rendering, both from Rochester Institute of Technology.