Developer Guide

Specialization Constants

DPC++ has a feature called
specialization constants
that can explicitly trigger JIT compilation to generate code from the intermediate SPIR-V code based on the run-time values of these specialization constants. These JIT compilation actions are done during the execution of the program when the values of these constants are known. This is different from the JIT compilation, which is triggered based on the options provided to
-fsycl-device-code-split
.
In the example below, the call to
set_specialization_constant
binds the value returned by the call to function
get_value
, defined on line 10, to the SYCL kernel bundle. When the kernel bundle is initially compiled, this value is not known and so cannot be used for optimizations. At runtime, after function
get_value
is executed, the value is known, so it is used by command groups handler to trigger JIT compilation of the specialized kernel with this value.
#include <CL/sycl.hpp> #include <vector> class specialized_kernel; // const static identifier of specialization constant const static sycl::specialization_id<float> value_id; // Fetch a value at runtime. float get_value() { return 10; }; int main() { sycl::queue queue; std::vector<float> vec(1); { sycl::buffer<float> buffer(vec.data(), vec.size()); queue.submit([&](auto &cgh) { sycl::accessor acc(buffer, cgh, sycl::write_only, sycl::no_init); // Set value of specialization constant. cgh.template set_specialization_constant<value_id>(get_value()); // Runtime builds the kernel with specialization constant // replaced by the literal value provided in the preceding // call of `set_specialization_constant<value_id>` cgh.template single_task<specialized_kernel>( [=](sycl::kernel_handler kh) { const float val = kh.get_specialization_constant<value_id>(); acc[0] = val; }); }); } queue.wait_and_throw(); std::cout << vec[0] << std::endl; return 0; }
The specialized kernel at line 24 will eventually become the code shown below:
cgh.single_task<specialized_kernel>( [=]() { acc[0] = 10; });
This JIT compilation also has an impact on the amount of time it takes to execute a kernel. This is illustrated by the example below:
#include <CL/sycl.hpp> #include <chrono> #include <vector> class specialized_kernel; class literal_kernel; // const static identifier of specialization constant const static sycl::specialization_id<float> value_id; // Fetch a value at runtime. float get_value() { return 10; }; int main() { sycl::queue queue; // Get kernel ID from kernel class qualifier sycl::kernel_id specialized_kernel_id = sycl::get_kernel_id<specialized_kernel>(); // Construct kernel bundle with only specialized_kernel in the input state sycl::kernel_bundle kb_src = sycl::get_kernel_bundle<sycl::bundle_state::input>( queue.get_context(), {specialized_kernel_id}); // set specialization constant value kb_src.set_specialization_constant<value_id>(get_value()); auto start = std::chrono::steady_clock::now(); // build the kernel bundle for the set value sycl::kernel_bundle kb_exe = sycl::build(kb_src); auto end = std::chrono::steady_clock::now(); std::cout << "specialization took - " << (end - start).count() << " nano-secs\n"; std::vector<float> vec{0, 0, 0, 0, 0}; sycl::buffer<float> buffer1(vec.data(), vec.size()); sycl::buffer<float> buffer2(vec.data(), vec.size()); start = std::chrono::steady_clock::now(); { queue.submit([&](auto &cgh) { sycl::accessor acc(buffer1, cgh, sycl::write_only, sycl::no_init); // use the precompiled kernel bundle in the executable state cgh.use_kernel_bundle(kb_exe); cgh.template single_task<specialized_kernel>( [=](sycl::kernel_handler kh) { float v = kh.get_specialization_constant<value_id>(); acc[0] = v; }); }); queue.wait_and_throw(); } end = std::chrono::steady_clock::now(); { sycl::host_accessor host_acc(buffer1, sycl::read_only); std::cout << "result1 (c): " << host_acc[0] << " " << host_acc[1] << " " << host_acc[2] << " " << host_acc[3] << " " << host_acc[4] << std::endl; } std::cout << "execution took : " << (end - start).count() << " nano-secs\n"; start = std::chrono::steady_clock::now(); { queue.submit([&](auto &cgh) { sycl::accessor acc(buffer2, cgh, sycl::write_only, sycl::no_init); cgh.template single_task<literal_kernel>([=]() { acc[0] = 20; }); }); queue.wait_and_throw(); } end = std::chrono::steady_clock::now(); { sycl::host_accessor host_acc(buffer2, sycl::read_only); std::cout << "result2 (c): " << host_acc[0] << " " << host_acc[1] << " " << host_acc[2] << " " << host_acc[3] << " " << host_acc[4] << std::endl; } std::cout << "execution took - " << (end - start).count() << " nano-secs\n"; }
Looking the runtimes reported by each of the timing messages it can be seen that the initial translation of the kernel takes a long time, while the actual execution of the JIT-compiled kernel takes less time. The same kernel which had not been precompiled to the executable state takes longer time because this kernel will have been JIT-compiled by the runtime before actually executing it.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.