Workflow for a CUDA* to SYCL* Migration
Overview
Use this basic workflow to migrate your entire code base for CUDA* applications to SYCL* and optimize the code for Intel® GPU kernels.
Target Audience
Software developers with strong CUDA development skills.
Prerequisites
You must have access to the Intel® Developer Cloud. This is a free virtual sandbox that gives access to Intel GPUs and oneAPI software tools, such as the Intel® oneAPI Base Toolkit (Base Kit).
To use your local development system, you must have the following:
- Access to an Intel GPU. See Optimize Your GPU Application with the Base Kit.
- Access to the Base Kit. This toolkit provides core tools and libraries to develop high-performance applications across diverse architectures. The Base Kit includes the Intel® DPC++ Compatibility Tool, which is useful in assisted migration of your CUDA sources.
For a complete set of migration resources, see Migrate with SYCL.
To migrate to SYCL, ensure you have a working CUDA application. You can migrate your CUDA sources by either:
- Auto-generating most of the SYCL code using the Intel DPC++ Compatibility Tool, which provides a side-by-side comparison of CUDA to SYCL code.
- Manually analyzing CUDA sources and replacing all specific CUDA calls with the equivalent SYCL calls.
The Intel DPC++ Compatibility Tool usually migrates 90%-95% of the code and generates warnings for code regions that need manual intervention to complete the migration.1
This tool uses helper functions defined in the <dpct/dpct.hpp> header file. This is due to some SYCL calls being wrapped in an extra layer to aid the dpct helper functions. The manually migrated SYCL code uses SYCL calls and syntax that map directly to CUDA calls.
Download and try a migration using the simple Vector Add sample.
1 Intel estimates are as of September 2021 and based on measurements on a set of 70 HPC benchmarks and samples, with examples such as Rodinia, Scalable Heterogeneous Computing (SHOC), and Pennant. Results may vary.
In this step, migrate your source code to SYCL using a manual or assisted method. After finishing the migration, continue your development work on the SYCL source code.
Assisted Migration
Migrate existing CUDA code to SYCL using the Intel DPC++ Compatibility Tool. The tool ports CUDA language kernels and library API calls, and migrates most of the CUDA code to the architecture and vendor-portable SYCL code.
Learn with a Code Sample
To help you decide how to migrate your CUDA sources, use the following resources:
- Guide to migrating a Jacobi sample: CUDA to SYCL Migration–Jacobi Iterative Method.
Get a detailed analysis of the migration with explicit explanations of the migration process and CUDA to SYCL mappings.
- Original CUDA source code: JacobiCUDAGraphs.zip.
- Migrated SYCL source code from Intel: Jacobi Iterative Solver.
The samples include separate sources that reflect the distinct workflow stages.
Note To learn more about SYCL and understand how the Intel DPC++ Compatibility Tool changed the source code, see CUDA to SYCL Migration–Jacobi Iterative Method. This guide explains the technical details between CUDA and SYCL mappings using the Jacobi sample.
Manual Migration
The manually migrated SYCL code uses actual SYCL calls and syntax that directly maps to CUDA calls. This method gives cleaner migrated code and makes it easier to follow the code. The code functionality between the two is nearly identical.
For technical details between CUDA and SYCL mappings using the Jacobi sample, see the instructions in the CUDA to SYCL Migration–Jacobi Iterative Method. This guide explains the underlying concepts of CUDA and SYCL, and the essential terms for migrating the code.
Although there are common steps for offloading and setting up asynchronous streams and memory allocation and copy, the actual work happens in the offload computation. CUDA and SYCL share some basic concepts about creating offload kernels that run on a GPU. To efficiently understand the SYCL syntax, map many of these concepts by identifying the similarities and differences:
- CUDA thread block and SYCL work group
- Shared local memory (SLM) access
- CUDA thread block and SYCL barrier synchronization
- CUDA cooperative group and SYCL subgroup
- CUDA warp primitives and SYCL group algorithms
- CUDA and SYCL atomics
Resources
At this stage, you have working code that compiles and runs. Optimize the migrated code for Intel GPUs using Intel® tools such as Intel® VTune™ Profiler and Intel® Advisor. These tools help identify areas of code to improve for optimizing your application performance. Both tools include graphical user interfaces to help visualize your optimization strategy.
Performance Analysis with Intel® VTune™ Profiler
Use this profiler to create a snapshot of your application performance baseline and identify focus areas for further analysis.
Follow these steps:
Roofline Analysis with Intel® Advisor
Use this tool to measure the actual performance of offloaded code using the GPU Roofline Insights analysis. You can evaluate GPU code to see how close the performance is to hardware maximums.
Follow these steps:
- Set up your environment to analyze GPU kernels.
- Run Roofline Analysis.
- Review results to evaluate throughput based on hardware models.
Note For more information on the Jacobi sample, see the "Tools for Performance Analysis" section in CUDA to SYCL Migration–Jacobi Iterative Method. In the Jacobi sample on GitHub*, the output of this optimization step is sycl_migrated_optimized.
Resources
- Optimize Your GPU Application with the Base Kit
- oneAPI GPU Optimization Guide
- Essentials of SYCL for Intel Developer Cloud (Training)