Intel® DPC++ Compatibility Tool
Migrate Your CUDA* Code to Portable C++ with SYCL* Multiarchitecture Code
Efficient Code Migration
- The Intel® DPC++ Compatibility Tool assists in migrating your existing CUDA* code to SYCL* code.
- DPC++ is based on ISO C++ and incorporates standard SYCL and community extensions to simplify data parallel programming.
How It Works
- The tool ports both CUDA language kernels and library API calls.
- Typically, 90%-95% of CUDA code automatically migrates to SYCL1.
- Inline comments help you finish writing and tuning your code.
Intel DPC++ Compatibility Tool Guide
Intel® oneAPI DPC++/C++ Compiler
1 An Intel estimate as of March 2023, which is based on measurements from a set of 85 HPC benchmarks and samples, with examples like Rodinia, SHOC, and Pennant. Results may vary.
What You Need
- The Intel DPC++ Compatibility Tool is included in the Intel® oneAPI Base Toolkit.
- It integrates into familiar IDEs, including Eclipse* and Microsoft Visual Studio*.
Download as Part of the Toolkit
The Intel DPC++ Compatibility Tool is included with the Intel oneAPI Base Toolkit. This is a core set of tools and libraries for developing high-performance, data-centric applications across diverse architectures.
Download the Stand-Alone Version
A stand-alone download of the Intel DPC++ Compatibility Tool is available. You can download binaries from Intel or choose your preferred repository.
Help the Intel DPC++ Compatibility Tool Evolve
This tool supports the oneAPI industry standards initiative. You are welcome to participate.
Code Migration: Before & After
Source CUDA Code
The Intel DPC++ Compatibility Tool migrates software programs implemented with current and previous versions of CUDA. For details, see the release notes.
#include <cuda.h>
#include <stdio.h>
const int vector_size = 256;
__global__ void SimpleAddKernel(float *A, int offset)
{
A[threadIdx.x] = threadIdx.x + offset;
}int main()
{
float *d_A;
int offset = 10000;
cudaMalloc( &d_A, vector_size * sizeof( float ) );
SimpleAddKernel<<<1, vector_size>>>(d_A, offset);
float result[vector_size] = { };
cudaMemcpy(result, d_A, vector_size*sizeof(float), cudaMemcpyDeviceToHost);
cudaFree( d_A );
for (int i = 0; i < vector_size; ++i) {
if (i % 8 == 0) printf( "\n" );
printf( "%.1f ", result[i] );
}
return 0;
}
Migrated Code
This resulting code is typical of what you can expect to see after code is ported. In most cases, code edits and optimizations will be required to complete the code migration.
#include <CL/sycl.hpp>
#include <dpct/dpct.hpp>
#include <stdio.h>
const int vector_size = 256;
void SimpleAddKernel(float *A, int offset, sycl::nd_item<3> item_ct1)
{
A[item_ct1.get_local_id(2)] = item_ct1.get_local_id(2) + offset;
}int main()
{
dpct::device_ext &dev_ct1 = dpct::get_current_device();
sycl::queue &q_ct1 = dev_ct1.default_queue();
float *d_A;
int offset = 10000;
d_A = sycl::malloc_device<float>(vector_size, q_ct1);
q_ct1.submit([&](sycl::handler &cgh) {
cgh.parallel_for(sycl::nd_range(sycl::range(1, 1, vector_size),
sycl::range(1, 1, vector_size)),
[=](sycl::nd_item<3> item_ct1) {
SimpleAddKernel(d_A, offset, item_ct1);
});
});
float result[vector_size] = { };
q_ct1.memcpy(result, d_A, vector_size * sizeof(float)).wait();
sycl::free(d_A, q_ct1);
for (int i = 0; i < vector_size; ++i) {
if (i % 8 == 0) printf( "\n" );
printf( "%.1f ", result[i] );
}
return 0;
}
Get Started
Download
Install and configure the Intel DPC++ Compatibility Tool, which is part of the Intel oneAPI Base Toolkit.
Learn More
Access additional samples, tutorials, and training resources.
1 An Intel estimate as of September 2024, which is based on measurements from a set of 100 HPC benchmarks, AI applications, and samples, with examples like GROMACS, llama.cpp, and SqueezeLLM. Results may vary.
Documentation & Code Samples
Success Stories
Code Samples
Get Started
Vector Add
This Hello World sample demonstrates how to migrate a simple program from CUDA to code that is compliant with SYCL. Use it to verify that your development environment is set up correctly for the migration.
Needleman Wunsch
This sample represents a typical example of migrating a working Make and CMake* project from CUDA to SYCL. The code implements the Needleman-Wunsch algorithm and is based on Rodinia, a set of benchmarks for heterogeneous computing.
Code Optimization
Concurrent Kernels
Implement this guided sample by migrating the original CUDA based code to SYCL for offloading computations to a GPU or CPU. Learn how to optimize and improve processing time using SYCL queues for concurrent running of several kernels on a GPU.
HSOptical Flow
This sample implements the Horn-Schnuck method for estimating optical flow. Learn how a partial differential equation (PDE) solver can be accelerated through GPU offload.
Quasi-random Generator
Implement this guided sample by migrating the original CUDA based code to SYCL for offloading computations to a GPU or CPU. The sample demonstrates migrating the constant memory feature in CUDA.
How to work with code samples:
Training
How to Migrate CUDA Code to C++ with SYCL
- CUDA to SYCL Automatic Migration Tool [5:55]
- A Detailed Migration Flow
- Tips and Tricks for Migrating CUDA to SYCL [59:43]
Hands-on Learning
Future-Proof Code on Modern Accelerator Processors
Specifications
Operating system for development:
- Linux*
- Windows*
Software tool requirements:
- CUDA header files
- Eclipse (optional)
- Visual Studio (optional)
For details, see the system requirements.
Stay In the Know on All Things CODE
Sign up to receive the latest tech articles, tutorials, dev tools, training opportunities, product updates, and more, hand-curated to help you optimize your code, no matter where you are in your developer journey. Take a chance and subscribe. You can change your mind at any time.