Similarly to the regular case of multiple queues within the same context, you can wait on event objects from CPU and GPU queue (error checking is omitted):
//notice that kernel object itself can be the same (shared)
clEnqueueNDRangeKernel(gpu_queue, kernel, … &eventObjects);
//other commands for the GPU queue
//flushing queue to start execution on the Intel® Graphics in parallel to populating to the CPU queue below
//notice it is NOT clFinish or clWaitForEvents to avoid serialization
clFlush(gpu_queue);//assuming NO RESOURCE or other DEPENDENCIES with CPU device
clEnqueueNDRangeKernel(cpu_queue, kernel, … &eventObjects);
//other commands for the CPU queue
//now let’s flush second queue
//now when both queues are flushed, let’s wait for both kernels to complete
In this example the first queue is flushed without blocking and waiting for results. In case of blocking calls like
, the actions are serialized with respect to devices. The reason is that in this example the commands do not get into the (second) queue before
in the first queue return (assuming you are in the same thread).
For the example, when proper serialization is critical refer to the "Writing to a Shared Resource" section.