Programming Guide

Contents

oneAPI Debug Tools

The following tools are available to help with debugging the SYCL* and OpenMP* offload process.
Tools to debug SYCL* and OpenMP* offload process
Tool
When to Use
Environment variables
Environment variables allow you to gather diagnostic information from the OpenMP and SYCL runtimes at program execution with no modifications to your program.
The onetrace tool from Profiling Tools Interfaces for GPU (PTI for GPU)
When using the oneAPI Level Zero and OpenCL™ backends for SYCL and OpenMP Offload, this tool can be used to debug backend errors and for performance profiling on both the host and device.
Intercept Layer for OpenCL™ Applications
When using the OpenCL™ backend for SYCL and OpenMP Offload, this library can be used to debug backend errors and for performance profiling on both the host and device (has wider functionality comparing with onetrace).
Intel® Distribution for GDB*
Used for source-level debugging of the application, typically to inspect logical bugs, on the host and any devices you are using (CPU, GPU, FPGA emulation).
Intel® Inspector
This tool helps to locate and debug memory and threading problems, including those that can cause offloading to fail.
Intel Inspector is included in the Intel oneAPI HPC Toolkit or the Intel oneAPI IoT Toolkit.
In-application debugging
In addition to these tools and runtime based approaches, the developer can locate problems using other approaches. For example:
  • Comparing kernel output to expected output
  • Sending intermediate results back by variables they create for debugging purposes
  • Printing results from within kernels
Both SYCL and OpenMP allow printing to stdout from within an offload region - be sure to note which SIMD lane or thread is providing the output.
Intel® Advisor
Use to ensure Fortran, C, C++, OpenCL™, and SYCL applications realize full performance potential on modern processors.
Intel® VTune
TM
Profiler
Use to gather performance data either on the native system or on a remote system.

Debug Environment Variables

Both the OpenMP* and SYCL offload runtimes, as well as Level Zero, OpenCL, and the Shader Compiler, provide environment variables that help you understand the communication between the host and offload device. The variables also allow you to discover or control the runtime chosen for offload computations.
OpenMP* Offload Environment Variables
There are several environment variables that you can use to understand how OpenMP Offload works and control which backend it uses.
OpenMP is not supported for FPGA devices.
OpenMP* Offload Environment Variables
Environment Variable
Description
LIBOMPTARGET_DEBUG
This environment variable enables debug output from the OpenMP Offload runtime. It reports:
  • The available runtimes detected and used (1,2)
  • When the chosen runtime is started and stopped (1,2)
  • Details on the offload device used (1,2)
  • Support libraries loaded (1,2)
  • Size and address of all memory allocations and deallocations (1,2)
  • Information on every data copy to and from the device, or device mapping in the case of unified shared memory (1,2)
  • When each kernel is launched and details on the launch (arguments, SIMD width, group information, etc.) (1,2)
  • Which Level Zero/OpenCL API functions are invoked (function name, arguments/parameters) (2)
Values: (0, 1, 2)
Default: 0
LIBOMPTARGET_INFO
This variable controls whether basic offloading information will be displayed from the offload runtime.
  • Prints all data arguments upon entering an OpenMP device kernel (1)
  • Indicates when a mapped address already exists in the device mapping table (2)
  • Dumps the contents of the device pointer map if target offloading fails (4)
  • Indicates when an entry is changed in the device mapping table (8)
  • Indicates when data is copied to and from the device (32)
Values: (0, 1, 2, 4, 8, 32)
Default: 0
LIBOMPTARGET_PLUGIN_PROFILE
This variable enables the display of performance data for offloaded OpenMP code. It displays:
  • Total data transfer times (read and write)
  • Data allocation times
  • Module build times (just-in-time compile)
  • The execution time of each kernel.
Values:
  • F
    - disabled
  • T
    - enabled with timings in milliseconds
  • T,usec
    - enabled with timings in microseconds
Default:
F
Example:
export LIBOMPTARGET_PLUGIN_PROFILE=T,usec
LIBOMPTARGET_PLUGIN
This environment variable allows you to choose the backend used for OpenMP offload execution.
The Level Zero backend is only supported for GPU devices.
Values:
  • LEVEL0
    or
    LEVEL_ZERO
    - uses the Level Zero backend
  • OPENCL
    - uses the OpenCL™ backend
Default:
  • For GPU offload devices:
    LEVEL0
  • For CPU or FPGA offload devices:
    OPENCL
SYCL* and DPC++ Environment Variables
The DPC++ compiler supports all standard SYCL environment variables. The full list is available from GitHub. Of interest for debugging are the following SYCL environment variables, plus an additional Level Zero environment variable.
SYCL* and DPC++ Environment Variables
Environment Variable
Description
SYCL_DEVICE_FILTER
This complex environment variable allows you to limit the runtimes, compute device types, and compute device IDs used by the runtime to a subset of all available combinations.
The compute device IDs correspond to those returned by the SYCL API,
clinfo
, or
sycl-ls
(with the numbering starting at 0) and have no relation to whether the device with that ID is of a certain type or supports a specific runtime. Using a programmatic special selector (like
gpu_selector
) to request a device filtered out by
SYCL_DEVICE_FILTER
will cause an exception to be thrown.
Refer to the Environment Variables descriptions in GitHub for additional details: https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md
Example values include:
  • opencl:cpu
    - use only the OpenCL™ runtime on all available CPU devices
  • opencl:gpu
    - use only the OpenCL runtime on all available GPU devices
  • opencl:gpu:2
    - use only the OpenCL runtime on only the third device, which also has to be a GPU
  • level_zero:gpu:1
    - use only the Level Zero runtime on only the second device, which also has to be a GPU
  • opencl:cpu,level_zero
    - use only the OpenCL runtime on the CPU device, or the Level Zero runtime on any supported compute device
Default: use all available runtimes and devices
SYCL_PI_TRACE
This environment variable enables debug output from the runtime.
Values:
  • 1 - report SYCL plugins and devices discovered and used
  • 2 - report SYCL API calls made, including arguments and result values
  • 1 - provides all available tracing
Default:disabled
ZE_DEBUG
This environment variable enables debug output from the Level Zero backend when used with the runtime. It reports:
  • Level Zero APIs called
  • Level Zero event information
Value: variable defined with any value - enabled
Default: disabled
Environment Variables that Produce Diagnostic Information for Support
The Level Zero backend provides a few environment variables that can be used to control behavior and aid in diagnosis.
An additional source of debug information comes from the Intel® Graphics Compiler, which is called by the Level Zero or OpenCL backends (used by both the OpenMP Offload and SYCL/DPC++ Runtimes) at runtime or during Ahead-of-Time (AOT) compilation. Intel Graphics Compiler creates the appropriate executable code for the target offload device. The full list of these environment variables can be found at https://github.com/intel/intel-graphics-compiler/blob/master/documentation/configuration_flags.md. The two that are most often needed to debug performance issues are:
  • IGC_ShaderDumpEnable=1 (default=0)
    causes all LLVM, assembly, and ISA code generated by the Intel® Graphics Compiler to be written to
    /tmp/IntelIGC/<application_name>
  • IGC_DumpToCurrentDir=1 (default=0)
    writes all the files created by
    IGC_ShaderDumpEnable
    to your current directory instead of
    /tmp/IntelIGC/<application_name>
    . Since this is potentially a lot of files, it is recommended to create a temporary directory just for the purpose of holding these files.
If you have a performance issue with your OpenMP offload or SYCL offload application that arises between different versions of Intel® oneAPI, when using different compiler options, when using the debugger, and so on, then you may be asked to enable
IGC_ShaderDumpEnable
and provide the resulting files. For more information on compatibility, see oneAPI Library Compatibility.

Offload Intercept Tools

In addition to debuggers and diagnostics built into the offload software itself, it can be quite useful to monitor offload API calls and the data sent through the offload pipeline. For Level Zero, if your application is run as an argument to the onetrace and ze_tracer tools, they will intercept and report on various aspects of Level Zero made by your application. For OpenCL™, you can add a library to
LD_LIBRARY_PATH
that will intercept and report on all OpenCL calls, and then use environment variables to control what diagnostic information to report to a file. You can also use onetrace or cl_tracer to report on various aspects of OpenCL API calls made by your application. Once again, your application is run as an argument to the onetrace or cl_tracer tool.
Intercept Layer for OpenCL™ Applications
This library collects debugging and performance data when OpenCL is used as the backend to your SYCL or OpenMP offload program. When OpenCL is used as the backend to your SYCL or OpenMP offload program, this tool can help you detect buffer overwrites, memory leaks, mismatched pointers, and can provide more detailed information about runtime error messages (allowing you to diagnose these issues when either CPU, FPGA, or GPU devices are used for computation). Note that you will get nothing useful if you use ze_tracer on a program that uses the OpenCL backend, or the Intercept Layer for OpenCL Applications library and cl_tracer on a program that uses the Level Zero backend.
Additional resources:
Profiling Tools Interfaces for GPU (onetrace, cl_tracer, and ze_trace)
Like the Intercept Layer for OpenCL™ Applications, these tools collect debugging and performance data from applications that use the OpenCL and Level Zero offload backends for offload via OpenMP* or SYCL. Note that Level Zero can only be used as the backend for computations that happen on the GPU (there is no Level Zero backend for the CPU or FPGA at this time). The onetrace tool is part of the Profiling Tools Interfaces for GPU (PTI for GPU) project, found at https://github.com/intel/pti-gpu. This project also contains the ze_tracer and cl_tracer tools, which trace just activity from the Level Zero or OpenCL offload backends respectively. The ze_tracer and cl_tracer tools will produce no output if they are used with the application using the other backend, while onetrace will provide output no matter which offload backend you use.
The onetrace tool is distributed as source. Instructions for how to build the tool are available from https://github.com/intel/pti-gpu/tree/master/tools/onetrace. The tool provides the following features:
  • Call logging: This mode allows you to trace all standard Level Zero (L0) and OpenCL™ API calls along with their arguments and return values annotated with time stamps. Among other things, this can give you supplemental information on any failures that occur when a host program tries to make use of an attached compute device.
  • Host and device timing: These provide the duration of all API calls, the duration of each kernel, and application runtime for the entire application.
  • Device Timeline mode: Gives time stamps for each device activity. All the time stamps are in the same (CPU) time scale.
  • Chrome Call Logging mode: Dumps API calls to JSON format that can be opened in chrome://tracing browser tool.
These data can help debug offload failures or performance issues.
Additional resources:

Intel® Distribution for GDB*

The Intel Distribution for GDB* is an application debugger that allows you to inspect and modify the program state. With the debugger, both the host part of your application and kernels that are offloaded to a device can be debugged seamlessly in the same debug session. The debugger supports the CPU, GPU, and FPGA-emulation devices. Major features of the tool include:
  • Automatically attaching to the GPU device to listen to debug events
  • Automatically detecting JIT-compiled, or dynamically loaded, kernel code for debugging
  • Defining breakpoints (both inside and outside of a kernel) to halt the execution of the program
  • Listing the threads; switching the current thread context
  • Listing active SIMD lanes; switching the current SIMD lane context per thread
  • Evaluating and printing the values of expressions in multiple thread and SIMD lane contexts
  • Inspecting and changing register values
  • Disassembling the machine instructions
  • Displaying and navigating the function call-stack
  • Source- and instruction-level stepping
  • Non-stop and all-stop debug mode
  • Recording the execution using Intel Processor Trace (CPU only)
For more information and links to full documentation for Intel Distribution for GDB, see
Get Started with Intel Distribution for GDB on
Linux* Host
|
Windows* Host.

Intel® Inspector for Offload

Intel® Inspector is a dynamic memory and threading error checking tool for users developing serial and multithreaded applications. It can be used to verify correctness of the native part of the application as well as dynamically generated offload code.
Unlike the tools and techniques above, Intel Inspector cannot be used to catch errors in offload code that is communicating with a GPU or an FPGA. Instead, Intel Inspector requires that the SYCL or OpenMP runtime needs to be configured to execute kernels on CPU target. In general, it requires definition of the following environment variables prior to an analysis run.
  • To configure a SYCL application to run kernels on a CPU device
    export SYCL_DEVICE_FILTER=opencl:cpu
  • To configure an OpenMP application to run kernels on a CPU device
    export OMP_TARGET_OFFLOAD=MANDATORY export LIBOMPTARGET_DEVICETYPE=cpu
  • To enable code analysis and tracing in JIT compilers or runtimes
    export CL_CONFIG_USE_VTUNE=True export CL_CONFIG_USE_VECTORIZER=false
Use one of the following commands to start analysis from the command line. You can also start from the Intel Inspector graphical user interface.
  • Memory:
    inspxe-cl -c mi3 -- <app> [app_args]
  • Threading:
    inspxe-cl -c ti3 -- <app> [app_args]
View the analysis result using the following command:
inspxe-cl -report=problems -report-all
If your SYCL or OpenMP Offload program passes bad pointers to the OpenCL™ backend, or passes the wrong pointer to the backend from the wrong thread, Intel Inspector should flag the issue. This may make the problem easier to find than trying to locate it using the intercept layers or the debugger.
Additional details are available from the
Intel Inspector User Guide for
Linux* OS
|
Windows* OS.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.