Avoid Needless Synchronization

OpenCL™ Developer Guide for Intel® Core™ and Intel® Xeon® Processors

Download PDF

ID 773005

Date 10/30/2018

Version 2018

Public

Avoid Needless Synchronization

For better results, avoid explicit command synchronization primitives, such as clEnqueueMarker and Barrier. Explicit synchronization commands and event tracking result in cross-module round trips, which decrease performance. The less you use explicit synchronization commands, the better the performance is.

Use the following techniques to reduce the explicit synchronization:

Merge kernels whenever possible. It also improves data locality.
If you need to wait for a kernel to complete execution before reading the resulting buffer, continue execution until you need the first buffer with results.
If an in-order queue expresses the dependency chain correctly, use it to define a string of dependent kernels. In the in-order execution model, the commands in a command queue are executed in the order of submission, with each command running to completion before the next one begins. This is a typical case for a straightforward processing pipeline. Consider the following:
- Using the blocking OpenCL™ API is more effective than explicit synchronization schemes based on OS synchronization primitives.
- If you are optimizing the kernel pipeline, first measure kernels separately to find the most time-consuming one. Avoid calling clFinish or clWaitForEvents in the final pipeline version frequently after, for example, each kernel invocation. Prefer submitting the whole sequence (to the in-order queue) and issue clFinish once or wait on the OpenCL event object, which reduces host-device round trips.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

OpenCL™ Developer Guide for Intel® Core™ and Intel® Xeon® Processors

Avoid Needless Synchronization

See Also