• 04/03/2020
  • Public Content
Contents

OpenVX* Performance Tips

A rule of thumb for OpenVX* is applicability of general “compute” optimizations, as from the optimization point of view, the OpenVX is just another API for compute. Thus, common tips like providing the sufficient parallel slack, avoiding redundant copies or extensive synchronization, and so on, hold true. These are especially pronounced for heterogeneous scenarios, as latencies of communicating to accelerators are involved.
There are specific actions to optimize the performance of an OpenVX application. These actions are as follows:  
  • Use virtual images whenever possible, as this unlocks many graph compiler optimizations.
  • Whenever possible, prefer standard nodes and/or extensions over user kernel nodes (which serve as memory and execution barriers, hindering performance). This gives the Pipeline Manager much more flexibility to optimize the graph execution.
  • If you still need to implement a user node, base it on the Advanced Tiling Extensions (see the Intel's Extensions to the OpenVX* API: Advanced Tiling chapter)
  • If the application has independent graphs, run these graphs in parallel using
    vxScheduleGraph
    API call.
  • Provide enough parallel slack to the scheduler- do not break work (for example, images) into too many tiny pieces. Consider kernel fusion.
  • For images, use smallest data type that fits the application accuracy needs (for example, 32->16->8 bits).
  • Consider heterogeneous execution (see the Heterogeneous Computing with OpenVINO™ toolkit chapter).
  • You can create an OpenVX image object that references a memory that was externally allocated (
    vxCreateImageFromHandle
    ). To enable zero-copy with the GPU the externally allocated memory should be aligned.  For more details, refer to https://software.intel.com/en-us/node/540453.
  • Beware of the (often prohibitive)
    vxVerifyGraph
    latency costs. For example, construct the graph in a way it would not require the verification upon the parameters updates. Notice that unlike Map/Unmap for the input images (see the Map/Unmap for OpenVX* Images section), setting new images with different meta-data (size, type, etc) almost certainly triggers the verification, potentially adding significant overhead.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.