Intel® Advisor User Guide

ID 766448
Date 3/31/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Model Threading Parallelism

The Suitability analysis examines your running serial program to provide approximate estimated performance characteristics of your annotated parallel sites. This shows you both the performance gain from running your parallel program on multiple CPUs and the likely impact of parallel overhead.

To choose the best places to add parallelism, locate the parallel sites that contribute the most to the overall program's gain. Because of the overhead of parallel execution - such as starting threads - certain parallel sites and tasks may not contribute to the overall program's gain, or may slow down its performance. After you identify such parallel sites or tasks that do not improve performance, either modify or eliminate their annotations.

Use the Suitability Report Window

After you run the Suitability tool, view its data in the Suitability Report window. This window contains multiple areas:

Location in Window Description

Upper

Any annotation-related error the Suitability tool detects appears at the top of the Suitability Report window. If you see such errors, the displayed Suitability data may not be reliable. To view the source location associated with an error, click the button. To fix the error, read the displayed error message, modify your source code to fix the problem, rebuild your target executable, and run Suitability tool analysis again.

Upper-left

The upper-left area shows the Maximum Program Gain for All Sites in the program. Your overall goal of adding parallelism is to increase the Maximum Program Gain for All Sites so the parallel program will execute as fast as possible. The measured serial execution runtime, predicted parallel runtime, and any measured paused time are displayed below Maximum Program Gain for All Sites. Use the predicted Suitability gain values to help you make informed decisions about where to add parallelism.

Upper-right

Use the upper-right row of modeling parameters to model performance. Choose a hardware configuration and threading model (parallel framework) values from the drop-down lists. If you select a Target System for Intel® Xeon Phi™ processors, an additional value for total Coprocessor Threads appears.

Below this row is a grid of data that shows the estimated performance of each parallel site detected during program execution. The Site Label shows the argument to the site annotation. Examine the predicted Site Gain and Impact to Program Gain (higher values are better) to estimate how much each site contributes to the Maximum Program Gain for All Sites for all sites (described above). To expand the data under Combined Site Metrics or Site Instance Metrics, click the icon to the right of that heading; to collapse data, click to the right of that heading.

To show or hide the side command toolbar, click the Survey tool or Survey tool icon.

Middle-left

If you choose a Target System of CPU, to view detailed characteristics of the selected site as well as its tasks and locks, click the Site Details tab.

The Scalability of Maximum Site Gain graph summarizes performance for the selected site. The number of CPU processors or total number of coprocessor threads appears on the horizontal X axis and the target's predicted performance gain appears on the Y axis. To change the default CPU Count and the Maximum CPU Count, set the Options value.

Lower-left

Below the graph is a list of issues that might be preventing better predicted performance gains as well as a summary of serial and predicted parallel time. To expand a line, click the down arrow to the right of the item's name. Most issues are related to the Runtime Modelingmodeling parameters. Later, you can use other Analyzer tools like Intel® VTune™ Profiler to measure actual performance of your parallel program.

Lower-middle

Use the Loop Iterations (Tasks) Modeling (or Tasks Modeling) modeling parameters to experiment with different loop structures, iteration counts, and instance durations that might improve the predicted parallel performance.

Click Apply to view the impact on the predicted performance.

Lower-right

Use the Runtime Modelingmodeling parameters to learn which parallel overhead categories might have an impact on parallel overhead. If you agree to address a category later by using the chosen parallel framework's capabilities or by tuning the parallel code after you have implemented parallelism, check that category.

Bottom-right

If the chosen Target System is Intel Xeon Phi or Offload to Intel Xeon Phi, additional Intel® Xeon Phi™ Advanced Modeling options appear below the Runtime Modeling area. To expand this area, click the down arrow to the right of Intel Xeon