Debugging Options
Tutorial: Debugging with Intel® Distribution for GDB*
Auto-Attach
The auto-attach feature enables listening to debug events from the GPU.
This feature allows the debugger to launch a gdbserver-ze process to listen to GPU debug events and connects the gdbserver-ze to the debugger. For each device on the system, an inferior is created on the gdbserver-ze connection. This feature is designed to improve your debugging experience and ensure that you can debug the kernel offloaded to GPU.
The auto-attach feature is enabled by default.
The feature does not affect the debugging capability on the CPU device. However, to eliminate the extra output this feature creates, you can turn it off with the INTELGT_AUTO_ATTACH_DISABLE environment variable. To do it, execute the following command on the shell before starting gdb-oneapi:
export INTELGT_AUTO_ATTACH_DISABLE=1
To enable the feature again:
unset INTELGT_AUTO_ATTACH_DISABLE
Reducing Overhead
Executing the info threads command may take a noticeable amount of time to complete, because it needs to fetch the data of a large number of threads when debugging GPUs. To reduce the overhead, it is recommended to run the info threads command without printing the frame arguments. This can be achieved by changing the setting globally using
set print frame-arguments none
or by using
with print frame-arguments none -- info threads
for a single command execution.
Pretty-Printing
The pretty-printing feature simplifies the display of complex objects. If a pretty-printer is registered for the type of value you are going to print, the debugger simplifies the output. Otherwise, the debugger prints the value normally.
Intel® Distribution for GDB* supports pretty-printing for SYCL* types id, buffer, and range from the sycl namespace.
You can write your own pretty-printer for any type. Refer to the Writing a Pretty Printer for more information.
To display the list of pretty-printers available, run the following command:
info pretty-print
Example output:
global pretty-printers: SYCL sycl::_V1::buffer sycl::_V1::id sycl::_V1::range builtin mpx_bound128
Pretty-printing is enabled by default. For example, when you print a value of the index variable:
print index
The output is the following:
$10 = sycl::_V1::id<1> = {32}
To disable pretty-printing and display raw content instead, use the /r flag:
print /r index
Example output:
$11 = {<sycl::_V1::detail::array<1>> = {common_array = {32, <No data fields>}}}
To disable all pretty-printers, use the following command:
disable pretty-printer
To enable pretty-printers:
enable pretty-printer
Prettify Frames
Some C++ templates/SYCL constructs make it difficult to view the output of info threads and backtrace given multi-line function names.
You can use the concept of frame filters to change the visibility of a printed frame with the ‘backtrace’ command. For details refer to the GDB Documentation Frame Filter API.
It is also possible to change the visibility of a printed frame globally. Consider the setting print frame-info:
(gdb) set print frame-info source-line (gdb) info threads -stopped Id Target Id Frame <...> 2.3:[0-15] Thread 1.3 53 int dim1 = wiID[1]; 2.4:[0-15] Thread 1.4 53 int dim1 = wiID[1]; 2.5:[0-15] Thread 1.5 53 int dim1 = wiID[1]; 2.6:[0-15] Thread 1.6 53 int dim1 = wiID[1]; 2.7:[0-15] Thread 1.7 53 int dim1 = wiID[1]; 2.8:[0-15] Thread 1.8 53 int dim1 = wiID[1]; <...>
Refer to the GDB documentation for more information.
Print settings for kernel data
Given the sample program array-transform.cpp:
18 using namespace std; 19 using namespace sycl; [...] 26 int main(int argc, char *argv[]) { 27 constexpr size_t length = 64; 28 int input[length]; 29 int output[length]; 30 31 // Initialize the input 32 for (int i = 0; i < length; i++) 33 input[i] = i + 100; 34 35 try { 36 queue q(default_selector_v, dpc_common::exception_handler); [...] 43 range data_range{length}; 44 buffer buffer_in{input, data_range}; 45 buffer buffer_out{output, data_range}; 46 47 q.submit([&](auto &h) { 48 accessor in(buffer_in, h, read_only); 49 accessor out(buffer_out, h, write_only); 50 51 // kernel-start 52 h.parallel_for(data_range, [=](id<1> index) { 53 size_t id0 = GetDim(index, 0); 54 int element = in[index]; // breakpoint-here
To review the contents of the sycl::buffer object buffer_in of length 64 from inside the kernel, one can use the following trick to make it print as an array:
(gdb) print (int[64]) *in.MData $1 = {100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163}
Note, inside the kernel we need to use a sycl::accessor to access data in the host’s sycl::buffer.
Use the print setting print elements to further configure number of printed elements:
(gdb) set print elements 10 (gdb) print (int[64]) *in.MData $2 = {100, 101, 102, 103, 104, 105, 106, 107, 108, 109...}
Consider the print setting print repeats in case the output contains repeated elements:
(gdb) show print repeats Threshold for repeated print elements is 10.