General Recommendations on User Kernels
- Just like with any other kernel in a graph, there might be multiple node instances for the same user-defined kernel. These instances might be potentially executed in parallel. So your user-kernel code should be thread safe.
- User kernels may act as implicit memory barriers. In most cases the OpenVX implementation should complete all prior computations and update memory objects that the user kernel depends on. This is not entirely performance friendly. So if you see a user kernel in the hotspots (Performance Analysis Using Intel® VTune™ Amplifier topic), consider the following:
- If the kernel is heavy enough, leverage parallelismwithinthe user kernel, for example by making the code parallel. Using Intel® Threading Building Blocks is strongly recommended as an API of choice for parallelism, as it would guarantee composability with the rest of OpenVX runtime. Specifically, using the TBB within a user kernel helps avoiding threading oversubscription.
- Consider the Advanced Tiling extension (the Intel Extensions to the OpenVX* API: Advanced Tiling chapter). This way you specify the tiled version of the user kernel and let the runtime to schedule the things in the best way, including parallelism.
- Merging kernels so that they combine steps on local data (kernel fusion) often provides better performance than any automatic approach.