Developer Guide

Contents

Enable the Read-Only Cache for Read-Only Accessors (
-Xsread-only-cache-size=
<N>
)

If your kernel accesses a read-only accessor that is guaranteed not to alias with other accessors and USM pointers, consider enabling the read-only cache using the
-Xsread-only-cache-size=
<N>
flag in your
dpcpp
command. You should use a read-only cache for high-bandwidth table lookups that is constant throughout the kernel execution. The read-only cache is optimized for high cache-hit performance.
Example
dpcpp -fintelfpga -Xshardware -Xsread-only-cache-size=<N> <source_file>.cpp
The compiler implements the read-only cache using on-chip memory blocks and privatizes it per kernel. Each kernel receives a version of the cache that serves all reads in the kernel from read-only no-alias accessors. The compiler replicates each private cache as many times as necessary to expose extra read ports. The size of each replicate is
<N>
bytes as specified by the
-Xsread-only-cache-size=
<N>
flag.
  • Unlike global memory accesses that have extra hardware for tolerating long memory latencies, the read-only cache suffers significant performance penalties for cache misses. If the buffer being accessed in your kernel code cannot fit in the cache, you might achieve better performance without enabling the cache. The cached data is discarded (invalidated) from the read-only cache every time the kernel is launched.
  • Currently, omitting the read-only cache for only a subset of your read-only accessors in your design is unsupported. If your design has multiple read-only no-alias accessors, you can either enable caching for all of them using the global
    -Xsread-only-cache-size=
    <N>
    flag or disable caching for all of them by removing the flag.
Consider the following example code snippet:
q.submit([&](handler &h) { accessor sqrt_lut(sqrt_lut_buf, h, read_only, ext::oneapi::accessor_property_list{no_alias}); accessor indices(indices_buf, h, read_write, ext::oneapi::accessor_property_list{no_alias, no_init}); accessor output(output_buf, h, write_only, ext::oneapi::accessor_property_list{no_alias, no_init}); h.single_task<class Test>([=]() { for (int i = 0; i < kNumInputs; ++i) { output[i] = sqrt_lut[indices[i]]; } }); });
Compile the above code using the following command:
dpcpp -fintelfpga -Xshardware -Xsread-only-cache-size=2048 <source_file>.cpp
The compiler creates a read-only cache of size 2048 bytes that serves the single read from
sqrt_lut
. If the cache is sized correctly to match the size of
sqrt_lut_buf
, then the cache improves the design throughput, especially because the read accesses are random.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.