OpenVX* Performance Tips
- Use virtual images whenever possible, as this unlocks many graph compiler optimizations.
- Whenever possible, prefer standard nodes and/or extensions over user kernel nodes (which serve as memory and execution barriers, hindering performance). This gives the Pipeline Manager much more flexibility to optimize the graph execution.
- If you still need to implement a user node, base it on the Advanced Tiling Extensions (see the Intel's Extensions to the OpenVX* API: Advanced Tiling chapter)
- If the application has independent graphs, run these graphs in parallel usingvxScheduleGraphAPI call.
- Provide enough parallel slack to the scheduler- do not break work (for example, images) into too many tiny pieces. Consider kernel fusion.
- For images, use smallest data type that fits the application accuracy needs (for example, 32->16->8 bits).
- Consider heterogeneous execution (see the Heterogeneous Computing with OpenVINO™ toolkit chapter).
- You can create an OpenVX image object that references a memory that was externally allocated (vxCreateImageFromHandle). To enable zero-copy with the GPU the externally allocated memory should be aligned. For more details, refer to https://software.intel.com/en-us/node/540453.
- Beware of the (often prohibitive)vxVerifyGraphlatency costs. For example, construct the graph in a way it would not require the verification upon the parameters updates. Notice that unlike Map/Unmap for the input images (see the Map/Unmap for OpenVX* Images section), setting new images with different meta-data (size, type, etc) almost certainly triggers the verification, potentially adding significant overhead.