Summary

You have completed the Analyzing OpenMP* and MPI Applications tutorial with Application Performance Snapshot, Intel® Trace Analyzer and Collector, and
Intel® VTune™
Profiler
. Here are some important things to remember when working with your own hybrid application:
Step
Tutorial Recap
Key Tutorial Take-Aways
1. Build and configure application
You made sure all of the relevant tools were installed. You built the application and tested running the application with various process/thread combinations to determine optimization opportunities.
Test various combinations of MPI processes and OpenMP threads for your hybrid application. Different combinations can produce very different performance results for the same application.
2. Get a performance overview with Application Performance Snapshot
You ran the
heart_demo
application with the
-aps
option to collect load balance information, memory and disk usage information, and other metrics.
Use the Application Performance Snapshot HTML report to review where your application is inefficient and determine which tool to use next.
3. Identify communication issues with Intel Trace Analyzer and Collector
You ran the application with the
-trace
option to understand MPI library wait times and communication patterns. You reviewed the results using the Message Profile chart and identified communication issues.
  • An application can be MPI-bound due to high MPI library wait times, active communications, or poor library optimization settings.
  • When evaluating an application with Intel Trace Analyzer and Collector, it is not always obvious where the problem area is. Examine the application and all available charts closely to find issues.
  • Use the Message Profile chart to view the intensity of point-to-point communications for each sender-receiver pair.
4. Tune MPI-bound code
You optimized the application by applying the Cuthill-McKee algorithm for reordering a mesh before performing calculations. You used Intel Trace Analyzer and Collector and Application Performance Snapshot to confirm the performance improvement.
After completing an optimization, it is beneficial to check the performance of the best MPI process and OpenMP thread combinations again to see if there has been any change. Run the application without any analysis software to get an accurate elapsed time.
5. Analyze vector instruction set with
Intel VTune
Profiler
You ran a performance analysis on the
heart_demo
application using
Intel VTune
Profiler
on the thread suggested by the Application Performance Snapshot report. You updated to the latest vector instruction set.
Using legacy vector instruction sets can lead to inefficient application performance. Be sure to use the latest vector instruction sets for your application.
6. Analyze serial and parallel code efficiency with
Intel VTune
Profiler
You reviewed issues with parallelism using
Intel VTune
Profiler
. You updated the sample code to fix problem functions. You reviewed the process/thread combinations and observed efficiency improvements.
Review the Bottom-up tab in
Intel VTune
Profiler
to find sections of your application that would benefit from threading and explore threaded code efficiency.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.