Tutorial: Analyzing an OpenMP* and MPI Application

Intel® Trace Analyzer and Collector
Application Performance Snapshot
Intel® VTune™
Profiler
for Linux* OS
Discover how to use Intel® Parallel Studio to tune hybrid applications by reviewing MPI utilization inefficiencies and balancing thread load levels.
About This Tutorial
This tutorial uses the sample
heart_demo
and guides you through basic steps required to analyze hybrid OpenMP* and MPI code for inefficiencies using
Intel® VTune™
Profiler
's Application Performance Snapshot, Intel® Trace Analyzer and Collector, and
Intel VTune
Profiler
.
The tutorial was last updated for the Intel Parallel Studio 2018 product release. The analysis was run on 8 cluster nodes with Intel® Xeon Phi™ processors (formerly code named Knights Landing), each with 256 logical CPUs.
Estimated Duration
Read tutorial: 10 minutes
Run through tutorial with sample application: 60+ minutes
Learning Objectives
After you complete this tutorial, you should be able to:
  • Build an application using the MPI library and Intel® C++ compiler.
  • Run the Application Performance Snapshot tool to get a high-level overview of performance optimization opportunities.
  • Run Intel Trace Analyzer and Collector to identify MPI-bound code.
  • Analyze the communication pattern of the source code.
  • Run the HPC Performance Characterization Analysis with
    Intel VTune
    Profiler
    to locate vectorization and parallelism issues in the sample code.
  • Compare results before and after optimization.
More Resources

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.