Migrate CUDA* Applications to oneAPI cross-architecture programming model based on SYCL*

ID 773504
Updated 12/30/2022
Version Latest
Public

author-image

By

Migrate your CUDA applications into SYCL open industry standard based C++ code using the Intel® DPC++ Compatibility Tool which is included with the Intel® oneAPI Base Toolkit. oneAPI is a core set of tools and libraries for developing high-performance, data-centric applications across diverse architectures.

SYCL and CUDA perform similar functions, allowing developers to utilize a GPU to optimize certain tasks they wish to accomplish. However, transitioning to SYCL can enable developers to write code once and run on many different heterogeneous processors, freeing the developer from hardware vendor lockdown. 

oneAPI and SYCL, which oneAPI leverages heavily are designed for multi-architecture and multi-vendor targeted software development from the ground up.

The syntax of CUDA and SYCL share similarities, thus migrating one, two or twenty lines of code is not very difficult.  If we are talking about a larger codebase, developers with an existing codebase will however benefit greatly from tools that can automate a large part of the migration process. 

The Intel® DPC++ Compatibility Tool migrates software programs implemented with current and previous versions of CUDA.

Migrate CUDA Applications with the Intel® DPC++ Compatibility Tool

To get started with the Intel® DPC++ Compatibility Tool, download and follow the installation instructions for Intel® oneAPI Base Toolkit

Once Intel® DPC++ Compatibility Tool is installed, you can use many different available command line options for the Intel® DPC++ Compatibility Tool to migrate your project. 

To set up the Intel® DPC++ Compatibility Tool environment, run the following:

  • On Linux (installation with root): source /opt/intel/oneapi/setvars.sh
  • On Linux (installation with any user): source ~/intel/oneapi/setvars.sh
  • On Windows: Drive:\Program Files (x86)\Intel\oneAPI\setvars.bat

Additional information on environment and tools options are available here.

For small projects with only a handful of sources, you can migrate each file individually by providing the Intel® DPC++ Compatibility Tool with the paths of the CUDA source files to be migrated and specifying the name of a directory the migrated files should be placed.

dpct file1.cu file2.cu file3.cu --out-root=output_dir

This will place the migrated source files, now with dp.cpp extensions, in the directory output_dir.

From here, editing the output source can be done by searching the file for comments beginning with “DPCT”, refer to the Diagnostics Reference.

Real-World Migration Example - CUDA* cuBLAS* syr Level-2 Sample

The cuBLAS syr routine performs a symmetric rank-1 update of a symmetric n by n matrix, which Intel® oneAPI Math Kernel Library (oneMKL) provides a SYCL counterpart through the mkl.h header. 

This level-2 routine from CUDA makes use of CMake to for its build system. To migrate this sample to SYCL code, a compilation database must be created from the CMake configuration. The Intel® DPC++ Compatibility Tool’s intercept build utility can be used in the build directory of the project to generate this compilation database. 

Export your Makefile Build-Settings Database

1. Create the build directory and generate a Makefile as is standard for CMake projects.

mkdir build && cd build
cmake ..

2. Once the Makefile is generated, the intercept-build tool can be used to create a compilation database

intercept-build make

This will generate a compile_commands.json file, which can be provided to the Intel® DPC++ Compatibility Tool for migrating the project.

Run the Intel® DPC++ Compatibility Tool

Running the Intel® DPC++ Compatibility Tool with the compilation database will migrate the project. To simplify the manual edits later in the process, suppressing code formatting using the –format-range option might be useful. 

dpct -p compile-commands.json --format-range=none

This will generate a dpct_output directory, which contains the migrated source code.

Some porting examples of the Intel® DPC++ Compatibility Tool can migrate on its own without developer intervention include:

CUDA defined macros gets migrated to oneMKL defined constants.

cublasFillMode_t uplo = CUBLAS_FILL_MODE_UPPER;
oneapi::mkl::uplo uplo = oneapi::mkl::uplo::upper;

Device memory allocation and data are automatically migrated to the SYCL API counterpart:

CUDA_CHECK(cudaMalloc(reinterpret_cast<void **>(&d_A), 
           sizeof(data_type) * A.size()));
CUDA_CHECK(cudaMalloc(reinterpret_cast<void **>(&d_x), 
           sizeof(data_type) * x.size()));
CUDA_CHECK(cudaMemcpyAsync(d_A, A.data(), sizeof(data_type) *          
           A.size(), cudaMemcpyHostToDevice, stream));
CUDA_CHECK(cudaMemcpyAsync(d_x, x.data(), sizeof(data_type) * 
           x.size(), cudaMemcpyHostToDevice, stream));
d_A = (data_type *)sycl::malloc_device( sizeof(data_type) * 
                                        A.size(), q_ct1);
d_x = (data_type *)sycl::malloc_device( sizeof(data_type) * 
                                        x.size(), q_ct1);
stream->memcpy(d_A, A.data(), sizeof(data_type) * A.size());
stream->memcpy(d_x, x.data(), sizeof(data_type) * x.size());
Migrate CUDA Specific API Calls

Additionally, Intel® DPC++ Compatibility Tool ports CUDA specific routines to their oneMKL equivalents. It modifies arguments for the routine to match what the oneMKL library expects, such as passing alpha by value as opposed to by reference as it done in the CUDA source.

CUBLAS_CHECK(cublasDsyr(cublasH, uplo, n, &alpha, d_x, incx, 
                        d_A, lda));


oneapi::mkl::blas::column_major::syr(*cublasH, uplo, n, alpha,
                                      d_x, incx, d_A, lda);

Most of the warnings generated by Intel® DPC++ Compatibility Tool can be expected to be of type DPCT1003 which indicates that the statement was previously returning an error code, which now under SYCL, utilizes exceptions for error handling.

Warning IDs like DPCT1003 are well documented and can be easily looked up in the tool’s user guide.

Since all of the lines that generated this error were migrated but the error checking macros CUDA_CHECK() and CUBLAS_CHECK() were not removed, we can utilize a text editor with find and replace, or a utility like sed*, to remove these macros in one pass. An example to remove the macro wrapper around each of the migrated lines using regular expressions could be:

CUBLAS_CHECK\(\((.+), 0\)\)    -> $1
CUDA_CHECK\(\((.+), 0\)\)      -> $1

The other warning generated by Intel® DPC++ Compatibility Tool in our example is DPCT1025, which indicates that the flag and priority parameters when creating and binding a stream are not used in the SYCL counterpart.

After reviewing the code, the warning has no effect on the code, and can be safely ignored for this sample.

Address Warnings and Finalize SYCL Migration

While there were some Intel® DPC++ Compatibility Tool warnings emitted from the included cublas_utils.h file, reviewing this file reveals that it provides utility functions for CUDA and cuBLAS error checking, which is not needed for the SYCL implementation, thus modifying the code in that file can be omitted, and the include for the utility header file can be removed from the migrated source code.

NOTE: CUDA error handling is Macro based, whereas SYCL error handling is exception based.

SYCL based programs utilize exceptions, which give developers benefits such as removing error handling from the normal flow of the program, as well making error handling more explicit. Additionally, because of the bubbling nature of exceptions in C++, a developer can handle any thrown exceptions in blocks outside the one where the function call throwing the exception was made. This avoids redundancies checking a function return code and re-returning that error code multiple times.   
Since SYCL based programs utilize exceptions instead of error codes for error handling, Intel® DPC++ Compatibility Tool wraps our main function with a try catch block. The original CUDA source checked the error code using one of the macros defined in cublas_utils.h instead.

However, these CUDA macros simply report the error if one was encountered and exit the program, so the single catch at the end of the main function in the migrated source will replicate the same functionality. If different handling for various errors is desired or needed, logic can be added to the catch block to differentiate between exception types and handle them accordingly.
Once all Intel® DPC++ Compatibility Tool warnings are resolved, the migrated source can be compiled using the Intel® Compiler, using the -fsycl option switch. Since the migrated SYCL source in our example uses the oneMKL, oneMKL headers should be included and the source should be linked with oneMKL binaries. To facilitate the creation of an appropriate command line to compile, please use the Link Line Advisor for oneMKL .

Using the Link Line Advisor for oneMKL provides the following command line to compile the migrated SYCL code:

dpcpp -DMKL_ILP64  -I"${MKLROOT}/include"  -L${MKLROOT}/lib/intel64 
-lmkl_sycl -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lsycl       -lOpenCL -lpthread -lm -ldl

Migrate your Code to SYCL Today

The Intel® DPC++ Compatibility Tool provides a great solution to accelerate the migration of your CUDA based application with GPU offload to SYCL, enabling it to be compliant with the open SYCL standard and the open oneAPI industry initiative. This enables you to run your high-performance offload code on a multitude of diverse heterogeneous multi architecture platforms future-proofing your software investments and mixing CPU and GPU vendors to meet your specific needs.

Take the Intel® DPC++ Compatibility Tool for a test-drive today.

Get the Software

Test it for yourself today by downloading the Intel® DPC++ Compatibility Tool and installing it alongside the Intel® oneAPI Base Toolkit with Priority Support.

Priority Support provides you with direct, confidential, and in-depth professional software support and guidance for oneAPI Developer Toolkits. Priority Support provides access to all of your 1:1 support database-tracked interactions with dedicated Intel software engineers, architecture consulting at a reduced cost, and more. Learn more about Priority Support for Intel oneAPI Developer Toolkits.