In many algorithms, it is likely that a lot of operations are performed on the device memory and some operations on the host memory. This is particularly true in simulation code where the host memory is updated using a host accessor. This causes many things to happen within the DPC++ run-time where the locks the buffer the accessor points to and updates the copy of this buffer memory on the host device. This pattern of access could cause a lot of memory copies from the device to the host and back in order to keep the data coherent.
Flow Graph Analyzer reports such issues in the following way:
Click the issue to highlight the loop in the graph that consists of a host pointer accessor.
If the buffer pointed by the host pointer accessor is large, the costs incurred due to this access can be a significant portion of each loop.