User Guide


OpenSHMEM* Code Analysis with Fabric Profiler

Fabric Profiler (preview feature) is a performance tool that you can use to identify detailed characteristics of the runtime behavior for an OpenSHMEM application.
This is a
. A preview feature may or may not appear in a future production release. It is available for your use in the hopes that you will provide feedback on its usefulness and help determine its future. Data collected with a preview feature is not guaranteed to be backward compatible with future releases.
The application consists of two parts:
  • Data collector
    monitors application and network behavior while the OpenSHMEM application is running.
  • Analyzer
    is a collection of tools that runs on a Linux* or Windows* workstation after the application has completed. These tools display profiling results with interactive features that allow you to explore a multitude of communication-centric behaviors.
The Fabric Profiler tool is distributed as part of
Intel® VTune™
. Full documentation of the tool, examples, and pre-collected trace files are available in the Fabric Profiler package.

Set Up the Data Collector

The Fabric Profiler data collector is implemented as a library that intercepts the OpenSHMEM calls of the application and monitors network activity. It populates binary trace files with this information.
Load the esp module by running:
module load esp
. The data collector package is installed in the
environment variable .
The data collector requires two third party libraries:
  • PAPI is used to gather system metrics at runtime. To add PAPI to your environment you may need to run
    module load papi
    , or download it from and build it.
  • OTF2 is used to generate trace files. You can obtain OTF2 at

Set Up the Analyzer

The analyzer is a collection of MATLAB* programs that run in the MATLAB runtime environment. They read the trace files and display results.
: You must have the MATLAB Runtime Environment to install the analyzer. This is a free download available at Select a version that is R2018a(9.4) or newer.
The analyzer is located in the release directory in
. It is a MATLAB program named
To start the analyzer, run the

Fabric Profiler Workflow

In the Fabric Profiler workflow, you perform these steps:
  1. Build and run an application using the data collector.
  2. Generate trace files.
  3. View trace files using the analyzer.

Build and Run an Application

Once you have installed Fabric Profiler on a Linux or Windows machine, complete these steps to build and run an application.
  1. Define Fabric Profiler regions in the source code.
    A named region is highlighted in analyzer displays and improves analysis.
    1. Include the header file
    2. Mark regions of interest:
      esp_enter("<region_name>"); exit_exit("<region_name>");
    3. Rebuild the application.
    You cannot nest or interleave regions.
  2. Build a statically-linked application with Fabric Profiler instrumentation.
    When you load the Fabric Profiler module (
    ), environment variables define important flags for you. Use these variables to link the Fabric Profiler data collector library into your code before the SHMEM library.
    For example, to build the
    example (from the examples directory) using Cray SHMEM, type:
    CC -static -o fixed-round $ESP_CFLAGS fixed-round.c $ESP_LDFLAGS $ESP_LDADD
    Make sure you adhere to these changes from your normal build:
    • Use the C++ compiler, even if the C-language application does not require it. The data collector library uses C++ and will not link without it.
    • Use
      to add the path to
      . It also adds
      which improves the quality of the trace files.
    • Use
      to add the path to the data collector library.
    • Use
      to add the data collector library.
  3. Build a dynamically-linked application with Fabric Profiler instrumentation.
    Fabric Profiler uses
    at run-time to link in the data collector library before the SHMEM library. Therefore, you do not need to rebuild your application unless you added Fabric Profiler regions to your source code.
    For example, the
    application (in the examples directory) is written in C. Unlike the case of static linking above, you do not need to use the C++ compiler to build this C-language application for use with Fabric Profiler instrumentation.
    cc -o fixed-round $ESP_CFLAGS fixed-round.c -dynamic
    sets the path to
    and adds
  4. Run an application with Fabric Profiler instrumentation.
    1. The data collector library uses the PAPI library and the OTF2 library. If you are using the shared library, you may need to run
      module load papi
      , or add PAPI to your library paths. You can download OTF2 at
    2. Load the Fabric Profiler module:
      module load esp
    3. There are many Fabric Profiler configuration parameters. The module sets them to default values which are sufficient when you run your application for the first time. The configuration parameters are described in a separate section.
    4. For a dynamic application, add the data collector library to the
      For example:
      export LD_PRELOAD=$ESP_ROOT/lib/$LD_PRELOAD srun --export=LD_PRELOAD,ALL <rest of srun command>
      If you have loaded the
      module, the environment variable
      contains the path to
      . See the sample job scripts
      in the examples directory.

Generate Trace Files

Once you run the data collector, it monitors the execution of your application as well as network activity. It writes trace files when the application has finished executing. Add 10% to your wall time for writing output to the trace files.
  1. See the application output to verify successful code instrumentation by the data collector. To verify, check these actions:
    1. Ensure that the
      environment variable is set to 1 and not 0.
    2. Call
      . The start banner of Fabric Profiler displays.
    3. Call
      . The stop banner of Fabric Profiler displays.
    If the
    environment variable is set correctly and the banners do not display on function call, contact
    for further assistance.
  2. Merge the trace files.
    The Fabric Profiler banner lists the path to the trace files. To merge traces, run
    $ESP_ROOT/bin/ \ <path to application executable> <path to trace directory> <number of PEs>
  3. Copy the trace files in the root level of the traces directory to the machine where you have installed the analyzer.

View Trace Files using the Analyzer

There are five types of analyzers which read trace files. All of them are located in
in the Fabric Profiler package. The analyzers are:
  • espba
    - Barrier analyzer
  • espfbla
    - Function backlog analyzer
  • espla
    - Function latency analyzer
  • espmsa
    - Message straggler analyzer
  • espr
    - A report that contains a summary of results
You can use the traces generated in the previous step or open pre-collected sample traces from
. Each of these traces corresponds to a SHMEM application in the
is a general report that summarizes all of the trace data in HTML format. Each sample application in the examples directory includes this report so you can view the report for the sample application without running the SHMEM application or MATLAB runtime. The
directory contains files named
{app name}_{number of PEs}.html
and associated directories named
{app name}_{number of PEs}_html_files
. Open the HTML file in a browser to view the report generated by the analyzer from the corresponding trace files in
Contents of Trace Files
During the operation of Fabric Profiler, when your application calls
, the data collector writes five trace files that contain information about application behavior.
Trace File
Information about every profiled SHMEM function call. Each process writes out a separate function trace file. After job completion, the individual function trace files are merged into a single file with the
script. The merged file is required by the analyzers.
When the SHMEM application is running, Fabric Profiler monitors send and receive counters on the host fabric interface card. The HFI file contains these time-stamped counter values.
When the SHMEM application is running, Fabric Profiler monitors system performance counters and gathers system information. This data is written to the profile file. Each process writes out a separate profile file. When the job completes, the individual profile trace files are merged into a single file with the
script. The merged file is required by the analyzers.
Fabric Profiler monitors the amount of data injected into the network with each
call and the destination node for each
operation. The put file contains these values.
The environment file is a list of all environment variables defined at SHMEM application run-time.
Types of Analyzers
This table describes each analyzer in the Fabric Profiler package, along with associated operations that you can perform.
Analyzer Type
Suggested Operations
Barrier Trace Analyzer
Reads the function trace file and displays barrier wait times for each barrier call in the source code for each PE.
  • Take any of these measurements:
    • PE wait time
    • PE arrival time
    • Node wait density
    • PE percent Late
    • PE Outlier Late
  • Vary the threshold.
  • Restrict your results to a specific lexical occurrence (a particular source code line containing a barrier)
Fabric Backlog Analyzer
Reads the put trace file and correlates that with the HFI trace file to visualize fabric backlog at any point in time.
  • Select "Show Region Bounds" and choose regions of interest. If the SHMEM code defined code regions, the temporal regions are highlighted on the graph of network backlog against time.
  • Select an individual node to display its associated backlog.
  • View injection and or ejection backlog (requested less actual)
    • Injection requested, data sent off-node by this node in the application
    • injection actual, data sent into network by the HFI
    • Ejection requested, data sent by other nodes in application to this node
    • Ejection actual, data received from network according to HFI
  • Zoom and pan to bring areas into focus.
  • Try offset adjustment modes.
  • Switch between toggle and rate displays.
  • Use the data cursor. Click on the widget first. Next clock anywhere on the plot to see data values for that point.
Function (latency) Trace Analyzer
Reads the function trace file and displays function latency for all instrumented SHMEM calls. Trace files that contain ~100,000s of function calls can take several minutes to complete. The default display shows composite PE wait time for all calls at each point in time.
  • Select individual function calls to display latency hot spots for each call.
  • If the application defined Fabric Profiler regions, click
    View Regions
    . Choose regions to highlight temporal spans on the graph which represent those regions of code.
  • Switch to the communications matrix. This visualizes the volume of data sent from each PE to every other PE.
  • Use the zoom, pan and data cursor widgets (under File and Help menus) to drill into the display data.
  • Experiment with the threshold controls for frequency, high value, and low value.
Message Straggler Analyzer
Reads the function trace file and correlates the activity in the trace file with network activity in the HFI trace file.
Analyzer Report
A non-interactive report that gathers information about a SHMEM application run and displays it in HTML format. The report can take several minutes to be completed. When completed, the HTML report is saved in the same location as the profile trace file, with a matching file name.
Use the File menu to select the profile trace file for a particular application run.

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at