Tutorial: Analyzing an OpenMP* and MPI Application

Intel® Trace Analyzer and Collector
Application Performance Snapshot
Intel® VTune™
for Linux* OS
Discover how to use Intel® Parallel Studio to tune hybrid applications by reviewing MPI utilization inefficiencies and balancing thread load levels.
About This Tutorial
This tutorial uses the sample
and guides you through basic steps required to analyze hybrid OpenMP* and MPI code for inefficiencies using
Intel® VTune™
's Application Performance Snapshot, Intel® Trace Analyzer and Collector, and
Intel VTune
The tutorial was last updated for the Intel Parallel Studio 2018 product release. The analysis was run on 8 cluster nodes with Intel® Xeon Phi™ processors (formerly code named Knights Landing), each with 256 logical CPUs.
Estimated Duration
Read tutorial: 10 minutes
Run through tutorial with sample application: 60+ minutes
Learning Objectives
After you complete this tutorial, you should be able to:
  • Build an application using the MPI library and Intel® C++ compiler.
  • Run the Application Performance Snapshot tool to get a high-level overview of performance optimization opportunities.
  • Run Intel Trace Analyzer and Collector to identify MPI-bound code.
  • Analyze the communication pattern of the source code.
  • Run the HPC Performance Characterization Analysis with
    Intel VTune
    to locate vectorization and parallelism issues in the sample code.
  • Compare results before and after optimization.
More Resources

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.