Migrating the ConcurrentKernels Application from CUDA* to SYCL*

ID 766163
Updated 10/23/2023
Version Latest




The Concurrent Kernels sample demonstrates the use of SYCL queues for concurrent execution of several kernels on GPU devices. The original CUDA source code is migrated to SYCL for portability across GPUs from multiple vendors and further demonstrates how to optimize and improve processing time.



What you will learn

How to migrate CUDA to SYCL

Time to complete

15 minutes


Concepts and Functionality

Key Implementation Details

This sample demonstrates the migration of the following prominent CUDA feature:

  • Stream and Event Management.
  • Reduction

ConcurrentKernels involves a kernel that does no real work but runs at least for a specified number of iterations.

The Sample demonstrates the use of multiple streams to enable simultaneous execution of kernels, where each stream represents an independent context for executing a kernel. By assigning kernels to different streams, they can run concurrently, effectively utilizing the GPU resources. To ensure desired execution times, the sample measures the clock frequency of the device and calculates the number of clock cycles required for each kernel. Finally, the kernels are queued, and a reduction operation is performed using the last stream.

Original CUDA source files: ConcurrentKernels.

Migrated SYCL source files including step by step instructions: guided_ConcurrentKernel_SYCLmigration.