Doing IO in the Kernel
Print statement is the most fundamental capability needed for
looking at the results of a program. In accelerators, printing is
surprisingly hard and also fairly expensive in terms of overhead.
DPC++ provides some capabilities to help make this task similar to
standard I/O C/C++ programs, but there are some quirks you need to
understand because of the way accelerators work. File I/O is not
possible from DPC++ kernels.
SYCL* provides the
stream
class to let you print information to the console
from within kernels, providing an easy way to debug simple issues without
resorting to a debugger. The stream
class provides functionality that is very
similar to the C++ STL ostream
class, and its usage is similar to the STL
class. Below we describe how to use SYCL stream
class to output information
from within an enqueued kernel.To use the class we must first instantiate it. The signature of the
stream
constructor is as follows:stream(size_t BufferSize, size_t MaxStatementSize, handler &CGH);
The constructor takes three parameters:
- BufferSize: the total number of characters that may be printed over the entire kernel range
- MaxStatementSize: the maximum number of characters in any one call to the stream class
- CGH: reference to thesycl::handlerparameter in thesycl::queue::submitcall
Usage is very similar to that of the C++ STL
ostream std::cout
class. The message or data that needs to be printed is sent to the
SYCL stream
instance via the appropriate operator<<
method. SYCL
provides implementations for all the built-in data types (such as
int
, char
and float
) as well as some common classes (such as
sycl::nd_range
and sycl::group
).Here is an example usage of a SYCL
stream
instance:void out1() {
constexpr int N = 16;
sycl::queue q;
q.submit([&](auto &cgh) {
sycl::stream str(8192, 1024, cgh);
cgh.parallel_for(N, [=](sycl::item<1> it) {
int id = it[0];
/* Send the identifier to a stream to be printed on the console */
str << "ID=" << id << sycl::endl;
});
}).wait();
} // end out1
The use of
sycl::endl
is analogous to the use of the C++ STL
std::endl
ostream
reference–it serves to insert a new line as
well as flush the stream.Compiling and executing the above kernel gives the following output:
ID=0
ID=1
ID=2
ID=3
ID=4
ID=5
ID=6
ID=7
ID=8
ID=9
ID=10
ID=11
ID=12
ID=13
ID=14
ID=15
Care must be taken in choosing the appropriate
BufferSize
and
MaxStatementSize
parameters. Insufficient sizes may cause
statements to either not be printed, or to be printed with less
information than expected. Consider the following kernel:void out2() {
sycl::queue q;
q.submit([&](auto &cgh) {
sycl::stream str(8192, 4, cgh);
cgh.parallel_for(1, [=](sycl::item<1> it) {
str << "ABC" << sycl::endl; // Print statement 1
str << "ABCDEFG" << sycl::endl; // Print statement 2
});
}).wait();
} // end out2
Compiling and running this kernel gives the following output:
ABC
The first statement was successfully printed out since the number of
characters to be printed is 4 (including the newline introduced by
sycl::endl
) and the maximum statement size (as specified by the
MaxStatementSize
parameter to the sycl::stream
constructor) is
also 4. However, only the newline from the second statement is
printed.The following kernel shows the impact of increasing the allowed
maximum character size:
void out3() {
sycl::queue q;
q.submit([&](auto &cgh) {
sycl::stream str(8192, 10, cgh);
cgh.parallel_for(1, [=](sycl::item<1> it) {
str << "ABC" << sycl::endl; // Print statement 1
str << "ABCDEFG" << sycl::endl; // Print statement 2
});
}).wait();
} // end out3
Compiling and running the above kernel gives the expected output:
ABC
ABCDEFG
The examples above used simple kernels with a single work item. More
realistic kernels will typically include multiple work items. In these
cases, no guarantee is made as to the specific order of the statements
printed to the console and you should expect statements from
different work items to be interleaved. Consider the following kernel:
void out4() {
sycl::queue q;
q.submit([&](auto &cgh) {
sycl::stream str(8192, 1024, cgh);
cgh.parallel_for(sycl::nd_range<1>(32, 4), [=](sycl::nd_item<1> it) {
int id = it.get_global_id();
str << "ID=" << id << sycl::endl;
});
}).wait();
} // end out4
One run can produce the following output.
ID=0
ID=1
ID=2
ID=3
ID=4
ID=5
[snip]
ID=26
ID=27
ID=28
ID=29
ID=30
ID=31
When this program is run again, we might get the output in a totally
different order, depending on the order the threads are executed.
ID=4
ID=5
ID=6
ID=7
ID=0
ID=1
[snip]
ID=14
ID=15
ID=28
ID=29
ID=30
ID=31
The output from
sycl::stream
is printed after the kernel has
completed execution. In most cases this is of no consequence. However,
should the kernel fault or throw an exception, no statement will
be printed. To illustrate this, consider the following kernel, which
raises an exception:void out5() {
int *m = NULL;
sycl::queue q;
q.submit([&](auto &cgh) {
sycl::stream str(8192, 1024, cgh);
cgh.parallel_for(sycl::nd_range<1>(32, 4), [=](sycl::nd_item<1> it) {
int id = it.get_global_id();
str << "ID=" << id << sycl::endl;
if (id == 31)
*m = id;
});
}).wait();
} // end out5
Compiling and executing the above code generates a segmentation fault
due the write to a null pointer.
Segmentation fault (core dumped)
None of the print statements are actually printed to the console. Instead, you
will see an error message about a segmentation fault. This is unlike
traditional C/C++ streams.