Explore SYCL with Samples from Intel

ID 772037
Date 2/08/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Explore SYCL with Samples from Intel

SYCL applications are C++ programs for parallelism. SYCL is designed for data parallel programming and heterogenous computing, and provides a consistent programming language (C++) and APIs across CPU, GPU, FPGA, and AI accelerators. Each architecture can be programmed and used, either in isolation or together. This allows developers to learn once and then program for distinct accelerators. Each class of accelerator requires an appropriate formulation and tuning of the algorithms for best performance, but the language and programming model remains consistent, regardless of the target device. For more details about SYCL, refer to the SYCL Specification.

This guide aims to help developers understand how to program using the oneAPI programming model, and how to target and optimize for the appropriate architecture to achieve optimal application performance.

For samples specific to FPGA, visit the Explore SYCL Through Intel® FPGA Code Samples page.

Build and Run a Sample Project

The links below take you to the Get Started with the Intel® oneAPI Base Toolkit content for the Command Line and IDE:

Sample 1: Simple Device Offload Structure

Sample 1 uses Vector Add as the equivalent of a Hello, World! sample for data parallel programs. It provides the basic structure of a SYCL application by showing you how to target an offload device. Sample 1 provides two different source files as examples of how to manage memory, you can use buffers or Unified Shared Memory (USM).

Vector Add provides both GPU and FPGA device selectors.

In this sample, you will learn how to use the basic elements (features) of SYCL to offload a simple computation using 1D arrays to accelerators. The basic features are:

  • A one-dimensional array of data.
  • A device selector queue, buffer, accessor, and kernel.
  • Memory management using buffers and accessors or USM.

Visit Code Sample: Vector Add for a detailed code walkthrough.

Get the sample:

Sample 2: Basic SYCL Features Defined

Using a two-dimensional stencil to simulate a wave propagating in a 2D isotropic medium, this sample walks you through the base tenets of SYCL step by step, with:

  • SYCL queues (including device selectors and exception handlers).
  • SYCL buffers and accessors.
  • The ability to call a function inside a kernel definition and pass accessor arguments as pointers. A function called inside the kernel performs a computation (it updates a grid point specified by the global ID variable) for a single time step.

Visit Code Sample: Two-Dimensional Finite-Difference Wave Propagation in Isotropic Media (ISO2DFD) for a detailed code walkthrough. Visit Explore Data Parallel C++ with Samples from Intel: ISO2DFD for a detailed video walkthrough.

Get the sample:

Sample 3: Optimizing for More Complex Applications

This sample extends the SYCL concepts reviewed in the previous sample and explains how to use the concepts to solve complex stencil computations in 3D. Moving from 2D to 3D grid sizes can uncover common general-purpose GPU (device) programming issues related to inefficient data access patterns, low flops-to-byte ratios, and low occupancy. The sample shows you how to use some SYCL features to correct those underlying issues and optimize performance. The entire sample consists of five versions of the same code, with each iteration demonstrating performance improvements.

The sample includes step-by-step instructions guiding you through the process of taking CPU based code, implementing GPU offload with SYCL, and stepping through several iterations using Intel® Advisor to improve performance. The code demonstrates how to use several important SYCL features

  • Local buffers and accessors (declare local memory buffers and accessors to be accessed and managed by each SYCL workgroup).

  • Shared local memory (SLM) optimizations.
  • Kernels (including parallel_for function and nd-range<3> objects).

Get the sample:

Sample 4: Introducing Synchronization

This sample adds some complexity in the form of a large number of moving particles and their interaction with a fixed grid of cells. This is used to illustrate new SYCL features like: Synchronization (atomic operations) and others.

Using this code sample shows you how to offload to an accelerator a computation that uses the following SYCL tools:

  • SYCL queues (including device selectors and exception handlers).
  • SYCL buffers and accessors (communicate data between the host and the device).
  • SYCL kernels (including parallel_for function and range<1> objects).
  • SYCL atomic operations for synchronization.
  • API-based programming: Use oneMKL to generate random numbers.

Visit Code Sample: Particle Diffusion for a detailed code walkthrough.

Get the sample:

Next Steps

Code Walkthroughs

Next, try a detailed code walkthrough on the following topics:

Determine Which Code to Offload

You can determine which parts of your code benefit from offloading to an accelerator with Intel® Advisor. The Offload Advisor feature allows you to collect performance predictor data, in addition to the standard profiling capabilities. It determines what code can be offloaded to a target device, which accelerates the performance of your CPU-based applications. The Get Started with Intel® Advisor helps you:

  • Optimize CPU or GPU code for memory and computes with Roofline Analysis.
  • Enable more vector parallelism and improve its efficiency.
  • Model, tune, and test multiple threading designs.
  • Create and analyze data flow and dependency-computation using heterogeneous algorithms.

Transform CUDA Code into SYCL Code

You can transform CUDA code into a standards-based SYCL code with a migration engine called the Intel® DPC++ Compatibility Tool. The Get Started Guide and User Guide help you migrate your existing CUDA applications and cover the general workflow of the migration process. The tool can be used to transform programs that are composed of multiple source and header files. It also includes:

  • One-time-only migration ports for both kernels and API calls.
  • An inline comments guide used to produce output, which can be compiled with the Intel® oneAPI DPC++/C++ Compiler.
  • Command-line tools and IDE plug-ins that streamline operations.

Additional Resources

Access a wide range of tutorials, videos, and webinar replays to learn more about SYCL and the supporting tools on the Intel® oneAPI Toolkits site.

Document

Description

Intel® oneAPI Programming Guide

Learn about oneAPI and SYCL, programming models, programming interfaces, SYCL runtimes, APIs, and software development processes.

Documentation Library

Look through our content to search for specific documents.

Explore SYCL Through Intel® FPGA Code Samples

Look through the FPGA code samples for more in-depth information.

Notices and Disclaimers

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.