Summary

Tutorial: Analyzing OpenMP* and MPI Applications

Download PDF

ID 773235

Date 5/20/2020

Version 2020

Public

Visible to Intel only — GUID: GUID-FEB9B638-236E-4FE9-A07F-F8C2108B6325

View Details

Summary

You have completed the Analyzing OpenMP* and MPI Applications tutorial with Application Performance Snapshot, Intel® Trace Analyzer and Collector, and Intel® VTune™ Profiler. Here are some important things to remember when working with your own hybrid application:

Step	Tutorial Recap	Key Tutorial Take-Aways
1. Build and configure application	You made sure all of the relevant tools were installed. You built the application and tested running the application with various process/thread combinations to determine optimization opportunities.	Test various combinations of MPI processes and OpenMP threads for your hybrid application. Different combinations can produce very different performance results for the same application.
2. Get a performance overview with Application Performance Snapshot	You ran the `heart_demo` application with the `-aps` option to collect load balance information, memory and disk usage information, and other metrics.	Use the Application Performance Snapshot HTML report to review where your application is inefficient and determine which tool to use next.
3. Identify communication issues with Intel Trace Analyzer and Collector	You ran the application with the `-trace` option to understand MPI library wait times and communication patterns. You reviewed the results using the Message Profile chart and identified communication issues.	An application can be MPI-bound due to high MPI library wait times, active communications, or poor library optimization settings. When evaluating an application with Intel Trace Analyzer and Collector, it is not always obvious where the problem area is. Examine the application and all available charts closely to find issues. Use the Message Profile chart to view the intensity of point-to-point communications for each sender-receiver pair.
4. Tune MPI-bound code	You optimized the application by applying the Cuthill-McKee algorithm for reordering a mesh before performing calculations. You used Intel Trace Analyzer and Collector and Application Performance Snapshot to confirm the performance improvement.	After completing an optimization, it is beneficial to check the performance of the best MPI process and OpenMP thread combinations again to see if there has been any change. Run the application without any analysis software to get an accurate elapsed time.
5. Analyze vector instruction set with Intel VTune Profiler	You ran a performance analysis on the `heart_demo` application using Intel VTune Profiler on the thread suggested by the Application Performance Snapshot report. You updated to the latest vector instruction set.	Using legacy vector instruction sets can lead to inefficient application performance. Be sure to use the latest vector instruction sets for your application.
6. Analyze serial and parallel code efficiency with Intel VTune Profiler	You reviewed issues with parallelism using Intel VTune Profiler. You updated the sample code to fix problem functions. You reviewed the process/thread combinations and observed efficiency improvements.	Review the Bottom-up tab in Intel VTune Profiler to find sections of your application that would benefit from threading and explore threaded code efficiency.

Parent topic: Tutorial: Analyzing an OpenMP* and MPI Application

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Tutorial: Analyzing OpenMP* and MPI Applications

Summary