User Guide

Contents

Run
GPU Roofline Insights
Perspective from Command Line

To plot a Roofline chart, the
Intel® Advisor
runs two steps:
  1. Collect OpenCL™ kernels timings and memory data using the Survey analysis with GPU profiling.
  2. Measure the hardware limitations and collect floating-point and integer operations data using the Characterization analysis with GPU profiling.
    Intel® Advisor
    calculates compute operations (FLOP and INTOP) as a weighted sum of the following groups of instructions: BASIC COMPUTE, FMA, BIT, DIV, POW, MATH
    Intel Advisor
    automatically determines data type in the collected operations using the
    dst
    register.
For convenience,
Intel Advisor
has the shortcut
--collect=roofline
command line action, which you can use to run both Survey and Characterization analyses with a single command. This shortcut command is recommended to run the
GPU Roofline Insights
perspective.

Prerequisites

  1. Configure your system to analyze GPU kernels.
  2. Set
    Intel Advisor
    environment variables
    with an automated script to enable the
    advisor
    command line interface (CLI).

Run the
GPU Roofline Insights
Perspective

There are two methods to run the GPU Roofline analysis. Use
one
of the following:
  • Run the shortcut
    --collect=roofline
    command line action to execute the Survey and Characterization analyses for GPU kernels with a single command. This method is recommended to run the
    CPU / Memory Roofline Insights
    perspective, but it does not support MPI applications.
  • Run the Survey and Characterization analyses for GPU kernels with the
    --collect=survey
    and
    --collect=tripcounts
    command actions separately one by one. This method is recommended if you want to analyze an MPI application.
Optionally, you can also run the Performance Modeling analysis as part of the
GPU Roofline Insights
perspective. If you select this analysis, it models your application performance on a baseline GPU device as a target to compare it with the actual application performance. This data is used to suggest more recommendations for performance optimization.
Info
: In the commands below, make sure to replace the
myApplication
with your application executable path and name
before
executing a command. If your application requires additional command line options, add them
after
the executable name.
Method 1. Run the Shortcut Command
  1. Collect data for a GPU Roofline chart with a shortcut.
    advisor --collect=roofline --profile-gpu --project-dir=./advi_results -– ./myApplication
    This command collects data both for GPU kernels and CPU loops/functions in your application. For kernels running on GPU, it generates a Memory-Level Roofline.
  2. Run Performance Modeling for the GPU that the application runs on.
    advisor --collect=projection --profile-gpu --model-baseline-gpu --project-dir=./advi_results
    Make sure to use the
    --model-baseline-gpu
    option for Performance Modeling to work correctly.
    This command models your application potential performance on a baseline GPU as a target to determine additional optimization recommendations.
Method 2. Run the Analyses Separately
Use this method if you want to analyze an MPI application.
  1. Run the Survey analysis.
    advisor --collect=survey --profile-gpu --project-dir=./advi_results -- ./myApplication
  2. Run the Characterization analysis to collect trip counts and FLOP data:
    advisor --collect=tripcounts --flops --profile-gpu --project-dir=./advi_results -- ./myApplication
    These commands collect data both for GPU kernels and CPU loops/functions in your application. For kernels running on GPU, it generates a Memory-Level Roofline.
  3. Run Performance Modeling for the GPU that the application runs on.
    advisor --collect=projection --profile-gpu --model-baseline-gpu --project-dir=./advi_results
    Make sure to use the
    --model-baseline-gpu
    option for Performance Modeling to work correctly.
    This command models your application potential performance on a baseline GPU as a target to determine additional optimization recommendations.
You can view the results in the Intel Advisor graphical user interface (GUI) or in CLI, or generate an interactive HTML report. See View the Results below for details.
Analysis Details
The
CPU / Memory Roofline Insights
workflow includes only the Roofline analysis, which sequentially runs the Survey and Characterization (trip counts and FLOP) analyses.
The analysis has a set of additional options that modify its behavior and collect additional performance data.
Consider the following options:
Roofline Options
To run the Roofline analysis, use the following command line action:
--collect=roofline
.
You can also use these options with
--collect=survey
and
--collect=tripcounts
if you want to run the analyses separately.
Recommended action options:
Options
Description
--profile-gpu
Analyze GPU kernels. This option is
required
for each command.
--target-gpu
Select a target GPU adapter to collect profiling data. The adapter configuration should be in the following format
<domain>
:
<bus>
:
<device-number>
.
<function-number>
. Only decimal numbers are accepted. Use this option if you have more than one GPU adapter on your system. The default is the latest GPU architecture version found on your system.
To see a list of GPU adapters available on your system, run
advisor --help target-gpu
and see the option description.
--gpu-sampling-interval=
<double>
Set an interval (in milliseconds) between GPU samples. By default, it is set to
1
.
--enable-data-transfer-analysis
Model data transfer between host memory and device memory. Use this option if you want to run the Performance Modeling analysis.
--track-memory-objects
Attribute memory objects to the analyzed loops that accessed the objects. Use this option if you want to run the Performance Modeling analysis.
--data-transfer=
<level>
Set the level of details for modeling data transfers during Characterization. Use this option if you want to run the Performance Modeling analysis.
Use one of the following values:
  • Use
    light
    to model only data transfer between host and device memory.
  • Use
    medium
    to model data transfers, attribute memory object, and track accesses to stack memory.
  • Use
    high
    to model data transfers, attribute memory objects, track accesses to stack memory, and identify where data can be reused.
See advisor Command Option Reference for more options.
Performance Modeling Options
To run the Performance Modeling analysis, use the following command line action:
--collect=projection
.
The action options in the table below are
required
to use when you run the Performance Modeling analysis as part of the
GPU Roofline Insights
perspective:
Options
Description
--profile-gpu
Analyze GPU kernels. This option is
required
for each command.
--enforce-baseline-decomposition
Use the same local size and SIMD width as measured on the baseline. This option is
required
.
--model-baseline-gpu
Use the baseline GPU configuration as a target device for modeling. This option is
required
.
This option automatically enables the
--enforce-baseline-decomposition
option, so you can use only
--model-baseline-gpu
.
See advisor Command Option Reference for more options.

View the Results

Intel Advisor
provides several ways to work with the GPU Roofline results.
View Results in GUI
When you run
Intel Advisor
CLI, a project is created automatically in the directory specified with
--project-dir
. All the collected results and analysis configurations are stored in the
.advixeproj
project, that you can view in the
Intel Advisor
.
To open the project in GUI, you can run the following command:
advisor-gui <project-dir>
If the report does not open, click
Show Result
on the Welcome pane.
You first see a Summary report that includes performance characteristics for code regions in your code. The left side of the report shows metrics for code regions that run on a GPU, the right side of the report shows metrics for code regions that run on a CPU. The report shows the following data:
  • Program metrics for all code regions executed on the GPU and loops/functions executed on the CPU, including total execution time, GPU usage effectiveness, and the number of executed operations.
  • Preview Roofline charts for CPU and GPU parts of your code. The charts plot an application's achieved performance and arithmetic intensity against the maximum achievable performance for top three dots and total dot, which combines all loops/functions (for CPU) and kernels (for GPU). By default, it shows Roofline for a dominating operations data type (INT or FLOAT). You can switch to a different data type using the
    FLOAT/INT
    toggle.
    This pane also reports the number of operations transferred per second, bandwidth for different memory levels, and an instruction mix histogram (for GPU only).
  • Top five hotspots on CPU and GPU sorted by elapsed time.
  • Performance characteristics of how well the application uses hardware resources.
  • Information about the analyses executed and platforms that the data was collected on.
View an Interactive HTML Report
Intel Advisor
enables you to export two types of HTML reports, which you can open in your preferred browser and share:
  • Interactive HTML report that represents results in the similar way as in GUI and comprises GPU metrics, operations and memory information, a roofline chart, a source view, and grid data.
    Collect offload modeling data to view results for
    Offload Modeling
    and
    GPU Roofline Insights
    perspectives in a single interactive HTML report.
  • HTML Roofline report that contains a GPU Roofline chart and enables you to customize your hardware configuration to view how your application executes with given compute and memory parameters.
For details on exporting the HTML reports, see Work with Standalone HTML Reports.
Save a Read-only Snapshot
A snapshot is a read-only copy of a project result, which you can view at any time using the
Intel Advisor
GUI. To save an active project result as a read-only snapshot:
advisor --snapshot --project-dir=
<project-dir>
[--cache-sources] [--cache-binaries] --
<snapshot-path>
where:
  • --cache-sources
    is an option to add application source code to the snapshot.
  • --cache-binaries
    is an option to add application binaries to the snapshot.
  • <snapshot-path
    is a path and a name for the snapshot. For example, if you specify
    /tmp/new_snapshot
    , a snapshot is saved in a
    tmp
    directory as
    new_snapshot.advixeexpz
    . You can skip this and save the snapshot to a current directory as
    snapshot
    XXX
    .advixeexpz
    .
To open the result snapshot in the
Intel Advisor
GUI, you can run the following command:
advisor-gui
<snapshot-path>
You can visually compare the saved snapshot against the current active result or other snapshot results.

Next Steps

Continue to identify performance bottlenecks on GPU. For details about the metrics reported, see Accelerator Metrics.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.