Intel® FPGA SDK for OpenCL™ Pro Edition: Programming Guide

ID 683846
Date 12/13/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

6.1.5. Requirement for Multiple Command Queues to Execute Kernels Concurrently

To execute kernels within the same OpenCL program object concurrently, instantiate a separate in-order command queue for each kernel you want to run concurrently. This can also be achieved by using a single out-of-order command queue and enqueuing independent kernels (no dependencies between each other).

A single in-order command queue can only dispatch a single operation for execution. Subsequent operations are not dispatched until the previous operation is fully complete. Thus, to execute kernels within the same OpenCL program object concurrently, instantiate a separate command queue for each kernel you want to run concurrently.

Similarly, multiple in-order command queues are also required to concurrently execute different transfers (clEnqueueReadBuffer or clEnqueueWriteBuffer), including executing transfers concurrently with kernels. For example, to execute kernel A concurrently with kernel B as well as concurrently with a clEnqueueWriteBuffer transfer, you must create three command queues and enqueue each of the operations in a separate queue. Achieving this in a steady state leads to maximum utilization of the FPGA device.

Out-of-order command queues may also be used to launch buffer writes, reads, and kernel execution concurrently. Enqueueing events that do not have any dependencies on one another into an out-of-order queue schedules them to execute concurrently, with the exception that these events are not blocked by their other dependencies.