User Guide


Examine Data Transfers for Modeled Regions

Accuracy Level


Enabled Analyses

Survey + Characterization (Trip Counts and FLOP with cache simulation and light data transfer simulation) + Performance Modeling with no assumed dependencies

Result Interpretation

After running the
Offload Modeling
perspective with
accuracy, you get an extended
Offload Modeling
report, which provides sufficient information about memory and cache usage and taxes of your offloaded application and shows you in additions to the basic data:
  • More accurate estimations of traffic and time for all cache and memory levels.
  • Measured data transfer and estimated data transfer between host and device memory.
  • Total data for the loop/function from different callees.
Offload Modeling
perspective assumes a loop is parallel if its dependency type is unknown. It means that there is no information about a loop from a compiler or the loop is not explicitly marked as parallel, for example, with a programming model (OpenMP*, Data Parallel C++,
Intel® oneAPI Threading Building Blocks
If you had a report generated for a lower accuracy, all offload recommendations, metrics, and speed-up will be updated to be more precise taking into account new data.
This topic describes data as it is shown in the
Offload Modeling
report in the Intel Advisor GUI. You can also view the results using an HTML report, but data arrangement and some metric names may vary.
Example of an Accelerated Regions report with data transfer and tax estimations (Offload Modeling perspective)
In the
Accelerated Regions
tab of the
Offload Modeling
report, review the metrics about memory usage and data transfers:
  • In the metrics table:
    • In the
      column of the
      Estimated Bound-by
      group, see the biggest and total time taxes paid for offloading a code regions to a target platform. Expand the
      Estimated Bound-by
       group to see a full picture of all time taxes paid for offloading the region to the target platform.
    • In the
      Estimated Data Transfer
      column, review the amount of data read by and written to a target platform if code is offloaded.
    • In the
      Memory Estimates
      column group, see how well your application uses resources of all memory levels. Expand the group to see more detailed and accurate metrics for different memory levels.
  • Select a code region from the table and review the details about amount of data transferred between host and device memory in the
    Data Transfer Estimations
    • See the total amount of data transferred in each direction and the corresponding offload taxes.
    • See hints about optimizing data transfers in the selected code region.
For details about metrics reported, see Accelerator Metrics.

Next Steps

  • Based on collected data, rewrite your code to offload to a target platform and measure performance of GPU kernels with
    GPU Roofline Insights
  • Consider running the
    Offload Modeling
    perspective with a higher level of accuracy to get more precise offload recommendations.

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at