Intel® DPC++ Compatibility Tool Best Practices

Published: 09/30/2020

This document explores some of the tips, tricks, and best-known practices when using the Intel® DPC++ Compatibility Tool to aid in the migration of user’s CUDA* code to Data Parallel C++ (DPC++) code.

This article is meant to complement the Get Started with the Intel® DPC++ Compatibility Tool and the Intel® DPC++ Compatibility Tool User Guide. Unless otherwise stated, the sample code contained herein is under the MIT license and presented for illustrative purposes only.

Migration Workflow Overview

In most cases, migration of a user’s CUDA source code to DPC++ code with the Intel® DPC++ Compatibility Tool can be divided into three stages: preparation, migration, and review.

Compatibility Tool Stages

In the preparation stage, the project directory is cleaned, compile options are noted, and in some cases source files may need to be modified. For most Makefile-based projects, we recommend running the intercept-build script which tracks and saves the compilation commands, compilation flags, and options automatically in a JSON file. For Microsoft* Visual Studio projects, ensure the .vcxproj file exists which can be passed to the dpct migration tool to keep track of project options. For simple projects, compile options and macros could be manually specified when running dpct. When running intercept-build in a command line, specify the build command.

intercept-build make

In the migration stage, the Compatibility Tool executable dpct is run. It takes in the original application as an input, analyzes its headers and sources as well as the generated compile_commands.json if it exists, and outputs DPC++ code and reports.

dpct -p ./ --in-root=./ --out-root=output *.cu

If intercept-build was not run, compile options can also be specified manually as dpct arguments.

dpct --out-root=output source.cu –extra-arg=”-I./include” --extra-arg=”-DBUILD_CUDA”

The Compatibility Tool can also be invoked inside the Microsoft Visual Studio or Eclipse IDEs. In the final review stage, manual verification and edits are required. For parts of the code that the Compatibility Tool is unable to migrate, the user will need to fix the migrated code and ensure correctness. For portions of code that require manual intervention, DPCT messages are logged in as comments in the migrated source files for easy discovery. In the dpct output example below, the original CUDA call cudaMemcpy (Denoted by the DPCT_ORIG comment) was migrated to DPC++. However, because CUDA uses error codes while DPC++ uses exceptions to handle errors, the dpct tool added message DPCT1003 in the comments to indicate additional manual edits are likely necessary.

/* DPCT_ORIG status = cudaMemcpy(Result, d_C, dcsize,
* cudaMemcpyDeviceToHost);*/
/*
DPCT1003:0: Migrated API does not return error code. (*, 0) is
inserted. You may need to rewrite this code.
*/
status = (q_ct1.memcpy(Result, d_C, dcsize).wait(), 0);

The following diagram illustrate the workflow and the files generated when using the Compatibility Tool.

Compatibility Tool Usage Flow

Preparation Best Practices

Before executing the Compatibility Tool, we recommend the following.

Ensure project source files are syntactically correct

Because the Compatibility Tool uses Clang to analyze the user’s CUDA source, if the original source has syntax errors, the migration likely won’t be successful.

“make clean” before running “intercept-build make”

Run make clean to remove unnecessary files prior to running the intercept-build script. This will help ensure the successful creation of the compilation database without interference.

For complex projects, use intercept-build command to create a compilation database

For projects that use Make or Cmake, keeping track of compilation options, settings, macro definitions, and include paths before running the Compatibility Tool can be difficult. Use intercept-build <build command> to automatically generate a compilation database in a JSON file that contains the build commands for the Compatibility Tool to use.

Code modifications needed prior to migration due to differences between Clang and nvcc

In certain cases, because the Clang parser used by dpct isn’t always compatible with nvcc, manual edits to the CUDA source may be needed prior to migration. Here are some examples.

  1. Namespace qualification in certain usage scenarios may be needed by the Clang parser but not required by nvcc.
  2. Additional forward class declarations may be needed by the Clang parser where not required by nvcc.
  3. Space within the triple brackets of kernel invocation are tolerated by nvcc but not Clang. For example, cuda_kernel<< <num_blocks, threads_per_block>> >(args…) is fine for nvcc but the Clang parser requires those spaces to be removed.

To see more dialect differences between Clang and nvcc, see the Compiling CUDA with clang page on llvm.org.

Migration Best Practices

When migrating your CUDA project to DPC++ using the dpct executable, many command-line options are available. See the User Guide for the complete list. Below are some helpful options to use in various scenarios.

If you have trouble migrating all project source files at once, it may be helpful to migrate one file at a time incrementally.

DPCT Basic Options
--in-root Path to the root of the source tree to be migrated
--out-root Path to root of generated files.
-p Path to compile database JSON file
--process-all Migraters/copies all files from --in-root directory to the --out-root directory, eliminating need to specify .cu files one by one
--extra-arg Specify more Clang compiler options.
e.g. dpct --extra-arg=”-std=c++14” –extra-arg=”-I…”
--format-style Sets formatting style for output files.
e.g. =llvm, =google, =custom (Uses .clang-format file)
--format-range Code formatting applied to no code (=none), migrated code (=migrated), or all code (=all)

Below are some recommended options to use to ease migration and debug.

DPCT Options that Ease Migration/Debug
--keep-original-code Keep original CUDA code in the comments of generated DPC++ file.
Allows easy comparison of original CUDA code to generated DPC++ code.
--comments Insert comments explaining the generated code
---always-use-async-handler Always create cl::sycl::queue with the async exception handler

Unified Shared Memory (USM) Usage

Unified Shared Memory (USM), supported in DPC++, is a feature that allows a pointer-based approach to manage host and device memory. When using the Compatibility Tool, the migrated DPC++ code will use USM as the default memory management method. When compared to using SYCL buffers, USM produces less volume of code and allows the dpct to support more memory-related APIs.

However, some compilers, especially non-Intel® compilers target non-Intel® hardware, may have trouble with USMs and may lead to runtime errors. Some examples of runtime errors are floating point exceptions and illegal memory accesses. If you encounter these rare cases, use “dpct --usm-level=none” as a workaround.

DPCT USM Option
--usm-level Sets Unified Shared Memory (USM) level.
=Restricted: Use USM (default)
=none: Uses helper functions and SYCL buffers

DPCT Helper Functions

The Compatibility Tool will use helper functions and classes in migrated DPC++ code. Some examples of utility functions provided are memory management tasks such as dpct_malloc, dpct_memcpy and get_buffer, and device management tasks such as get_default_queue and get_default_context. The associated files are located in <dpcpp-ct installation directory>/latest/include/dpct, the main header file is dpct.hpp, and the namespace is dpct:: .

These DPCT helper functions are intended for migrated code only and not for any other purpose. If you write new DPC++ code, it’s not recommended to use these DPCT helpers.

Review and Edit Best Practices

After running dpct, some manual editing is usually required before the migrated DPC++ code can be compiled. In most cases, the output file will contain hints and comments to help you migrate the remainder of the code. Review these comments and make changes to make sure the migrated code is reciprocal in logic. Use the Diagnostics Reference in the Intel® DPC++ Compatibility Tool User Guide to get an explanation of the comments including detailed help and suggestion to fix the issues.

Timing Issues

Timing-related blocks may need manual editing, since calculation of time span is implementation-specific. Rewrite code containing any language-specific features and library dependencies to ensure equivalency.

For example, one potential issue involves profiling-related timer calls. If you’re using timer functions from CUDA samples such as sdkCreateTimer or sdkStartTimer, you may need to reimplement those calls.

Common Runtime Issues

After code is migrated to DPC++ and compilation issues are fixed, you may still encounter runtime issues. Here are some common errors and possible fixes.

“OpenCL API failed. OpenCL API returns: -52 (CL_INVALID_KERNEL_ARGS)”: This error indicates that some pointers are not properly set before execution on the device. Ensure all pointers are initialized to valid memory allocation or NULL before using them in parallel_for device execution. 

“OpenCL API failed. OpenCL API returns: -54 (CL_INVALID_WORKGROUP_SIZE)”: Different accelerators have hardware differences that limit the maximum number of work items in a workgroup. For example, NVIDIA* hardware often limits the workgroup size at 512 while Intel® Gen9 graphics is limited at 256. If this error is encountered, the workgroup size set at kernel launch needs to be adjusted according to hardware limits.

 “Caught asynchronous SYCL exception”: Follow the error message where the exception was caught and make the necessary changes.

Conclusion

The Intel DPC++ Compatibility Tool can effectively assist developers migrating user’s CUDA code to DPC++ code significantly reducing code migration time. As you progress through the three stages of using the Compatibility Tool, follow the best practices outlined in this document to quickly resolve any issues that may arise.

Further Information

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.