OpenMP* Execution Model
The OpenMP* execution model has a single host device but multiple
target devices. A device is a logical execution engine with its own
local storage and data environment.
When executing on ATS or PVC, the entire GPU (which is composed of
2-tiles) can be considered as a device, or each tile can be considered
as a device.
OpenMP starts executing on the host. When a host thread encounters a
target
construct, data is transferred from host to device
(if specified by map
clauses, for example), and code in the
construct is offloaded onto the device. At the end of the target
region, data is transferred from device to host (if so
specified).By default, the host thread waits for the
target
region to finish
before proceeding further. nowait
on a target
construct
specifies that the host thread does not need to wait for the
target
region to finish. In other words, the nowait
clause
allows the asynchronous execution of the target
region.Synchronizations between regions of the code executing asynchronously
can be achieved via the
taskwait
directive, depend
clauses,
(implicit and explicit) barriers, or other synchronization mechanisms.