oneAPI Debug Tools
When to Use
Environment variables allow you to gather diagnostic information from the OpenMP and DPC++ runtimes at program execution with no modifications to your program.
Intercept Layer for OpenCL™ Applications
When using the OpenCL backend for DPC++ and OpenMP Offload, this library can be used to debug backend errors and for performance profiling on both the host and device.
Intel® Distribution for GDB*
Used for source-level debugging of the application, typically to inspect logical bugs, on the host and any devices you are using (CPU, GPU, FPGA emulation).
This tool helps to locate and debug memory and threading problems, including those that can cause offloading to fail.
Intel Inspector is included in the Intel oneAPI HPC Toolkit or the Intel oneAPI IoT Toolkit.
In addition to these tools and runtime based approaches, the developer can locate problems using other approaches. For example:
Debug Environment Variables
This environment variable enables debug output from the OpenMP Offload runtime. It reports:
Values: (0, 1, 2)
This variable enables the display of performance data for offloaded OpenMP code. It displays:
This environment variable allows you to choose the backend used for OpenMP offload execution.
The Level Zero backend is only supported for GPU devices.
This complex environment variable allows you to limit the runtimes, compute device types, and compute device IDs used by the DPC++ runtime to a subset of all available combinations.
The compute device IDs correspond to those returned by the SYCL API,
sycl-ls(with the numbering starting at 0) and have no relation to whether the device with that ID is of a certain type or supports a specific runtime. Using a programmatic special selector (like
gpu_selector) to request a device filtered out by
SYCL_DEVICE_FILTERwill cause an exception to be thrown.
Refer to the Environment Variables descriptions in GitHub for additional details: https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md
Example values include:
Default: use all available runtimes and devices
This environment variable enables debug output from the DPC++ runtime.
This environment variable enables debug output from the Level Zero backend when used with the DPC++ runtime. It reports:
Value: variable defined with any value - enabled
- Level Zero Specification, core programming guide: https://spec.oneapi.com/level-zero/latest/core/PROG.html#environment-variables
- Level Zero Specification, tool programming guide: https://spec.oneapi.com/level-zero/latest/tools/PROG.html#environment-variables
- IGC_ShaderDumpEnable=1 (default=0)causes all LLVM, assembly, and ISA code generated by the Intel® Graphics Compiler to be written to/tmp/IntelIGC/<application_name>
- IGC_DumpToCurrentDir=1 (default=0)writes all the files created byIGC_ShaderDumpEnableto your current directory instead of/tmp/IntelIGC/<application_name>. Since this is potentially a lot of files, it is recommended to create a temporary directory just for the purpose of holding these files.
Offload Intercept Tools
- Extensive information on building and using the Intercept Layer for OpenCL Applications is available from https://github.com/intel/opencl-intercept-layer.For best results, runcmakewith the following flags:-DENABLE_CLIPROF=TRUE -DENABLE_CLILOADER=TRUE
- Information on the controls for the Intercept Layer for OpenCL Applications can be found at https://github.com/intel/opencl-intercept-layer/blob/master/docs/controls.md.
- Information about optimizing for GPUs is available from the Intel oneAPI GPU Optimization Guide.
- Call logging: This mode allows you to trace all standard Level Zero (L0) API calls along with their arguments and return values annotated with time stamps. Among other things, this can give you supplemental information on any failures that occur when a host program tries to make use of an attached compute device.
- Host and device timing: These provide the duration of all API calls, the duration of each kernel, and application runtime for the entire application.
- Device Timeline mode: Gives time stamps for each device activity. All the time stamps are in the same (CPU) time scale.
- Browser visualization: It is possible to dump results of Call Tracing and Device Timeline modes into a trace (JSON) file to visualize it in browser.
Intel® Distribution for GDB*
- Automatically attaching to the GPU device to listen to debug events
- Automatically detecting JIT-compiled, or dynamically loaded, kernel code for debugging
- Defining breakpoints (both inside and outside of a kernel) to halt the execution of the program
- Listing the threads; switching the current thread context
- Listing active SIMD lanes; switching the current SIMD lane context per thread
- Evaluating and printing the values of expressions in multiple thread and SIMD lane contexts
- Inspecting and changing register values
- Disassembling the machine instructions
- Displaying and navigating the function call-stack
- Source- and instruction-level stepping
- Non-stop and all-stop debug mode
- Recording the execution using Intel Processor Trace (CPU only)
Intel® Inspector for Offload
- To configure a DPC++ application to run kernels on a CPU deviceexport SYCL_DEVICE_FILTER=opencl:cpu
- To configure an OpenMP application to run kernels on a CPU deviceexport OMP_TARGET_OFFLOAD=MANDATORY export LIBOMPTARGET_DEVICETYPE=cpu
- To enable code analysis and tracing in JIT compilers or runtimesexport CL_CONFIG_USE_VTUNE=True export CL_CONFIG_USE_VECTORIZER=false
- Memory:inspxe-cl -c mi3 -- <app> [app_args]
- Threading:inspxe-cl -c ti3 -- <app> [app_args]