Examine Data Transfers for Modeled Regions
- More accurate estimations of traffic and time for all cache and memory levels.
- Measured data transfer and estimated data transfer between host and device memory.
- Total data for the loop/function from different callees.
- In theCode Regionsmetrics table:
- In theEstimated Bounded Bycolumn, review how much time is spent to transfer data (data transfer tax). In theTaxes with Reusecolumn, see the biggest and total time taxes paid for offloading a code regions to a target platform.Expand theEstimated Bounded Bygroup to see a full picture of all time taxes paid for offloading the region to the target platform.
- In theEstimated Data Transfer with Reusecolumn, review how much data is transferred per kernel in different directions (from host to device, from device to host). Expand the column to see data per memory level.
- In theMemory Estimationscolumn, see how well your application uses resources of all memory levels. Expand the group to see more detailed and accurate metrics for different memory levels.
- Select a code region from the table and review the details about data transferred between host and device memory in theData Transfer Estimationspane.
- In theTransferred Data & Taxhistogram, see the distribution of data transferred between the host and target devices in each direction.
- See hints about optimizing data transfers in the selected code region.
- In theRecommendationstab, get guidance for offloading your code to a target device and optimizing it so that your code benefits the most. If the code region has room for optimization or underutilizes the capacity of the target device,Intel Advisorprovides you with hints and code snippets that might be helpful to you for further code improvement.
- Set the data transfer simulation under the characterization analysis toMediumand run the perspective. The result should have theData Transfer Estimationspane extended with new data reporting information about memory objects in each code region.Offloaded Objectspane shows a list of memory objects with data about each object aggregated between different instances of one region.Analyticshistogram shows the number of memory objects that the selected region accessed distributed by their size.
- Set the data transfer simulation under the characterization analysis toHighand enable theData Reuse Analysischeckbox under the Performance Modeling analysis. With data reuse analysis,Intel Advisordetects groups of parallel code regions that can reuse memory objects transferred to a target GPU device. Such memory objects can be transferred to GPU only once and reused, which can improve data transfer efficiency.The result should have data transfer metrics in theCode Regionspane estimated with and without data reuse for each code region. Examine the metrics in theEstimated Bounded ByandEstimated Data Transfer with Reusecolumns to check if a code region can benefit from applying data reuse.For code regions that can benefit from data reuse, you should seeApply Data Reuseguidance in theRecommendationstab. The guidance shows the data transfer estimated with and without data reuse and the performance gain from applying the data reuse. It also explains how you can apply the data reuse technique to your code.
- If you think that the estimated speedup is enough and the application is ready to be offloaded, rewrite your code to offload profitable code regions to a target platform and measure performance of GPU kernels with GPU Roofline Insights perspective.