User Guide

Contents

Run
CPU / Memory Roofline Insights
Perspective from Command Line

To plot a Roofline chart, the
Intel® Advisor
does the following:
  1. Collect OpenCL™ kernels timings and memory data using the Survey analysis with GPU profiling.
  2. Measure the hardware limitations and collect floating-point and integer operations data using the Characterization analysis with GPU profiling.
    Intel® Advisor
    calculates compute operations (FLOP and INTOP) as a weighted sum of the following groups of instructions: BASIC COMPUTE, FMA, BIT, DIV, POW, MATH
    Intel Advisor
    automatically determines data type in the collected operations using the
    dst
    register.

Prerequisites

Set
Intel Advisor
environment variables
with an automated script to enable the
advisor
command line interface (CLI).

Plot a CPU Roofline Chart

There are two methods to run the CPU Roofline. Use
one
of the following:
  • Run the shortcut
    --collect=roofline
    command line action to execute the Survey and Characterization analyses with a single command. This method is recommended to run the
    CPU / Memory Roofline Insights
    perspective, but it does not support MPI applications.
  • Run the Survey and Characterization analyses with the
    --collect=survey
    and
    --collect=tripcounts
    command actions separately one by one. This method is recommended if you want to analyze an MPI application.
Info
: In the commands below, make sure to replace the
myApplication
with your application executable path and name
before
executing a command. If your application requires additional command line options, add them
after
the executable name.
Method 1. Run the Shortcut Command
To collect data for a CPU Roofline chart with a shortcut, run the following command:
advisor --collect=roofline --project-dir=./advi_results -– ./myApplication
This command collects data for a basic CPU Roofline chart based on the Cache-Aware Roofline model. You can add other option to the command to collect more data. See
Analysis Details
below for more options.
Method 2. Run the Analyses Separately
Use this method if you want to analyze an MPI application.
  1. Run the Survey analysis.
    advisor --collect=survey --project-dir=./advi_results -- ./myApplication
  2. Run the Characterization analysis to collect trip counts and FLOP data:
    advisor --collect=tripcounts --flops --project-dir=./advi_results -- ./myApplication
These commands collect data for a basic CPU Roofline chart based on the Cache-Aware Roofline model. You can add other option to the command to collect more data. See
Analysis Details
below for more options.
You can view the results in the Intel Advisor graphical user interface (GUI), or generate an interactive HTML report. See View the Results below for details.
Analysis Details
The
CPU / Memory Roofline Insights
workflow includes the following analyses:
  1. Roofline to plot a Roofline chart. This step sequentially runs the Survey and Characterization (trip counts and FLOP) analyses.
  2. Memory Access Patterns (optional) to identify memory traffic data and memory usage issues.
  3. Dependencies (optional) to identify loop-carried dependencies that might limit offloading.
Each analysis has a set of additional options that modify its behavior and collect additional performance data. The more analyses you run and option you use, the more useful data about your application you get.
Consider the following options:
Roofline Options
To run the Roofline analysis, use the following command line action:
--collect=roofline
.
You can also use this options with
--collect=tripcounts
if you want to run the analyses separately.
Recommended action options:
Options
Description
--stacks
Enable advanced collection of call stack data. Use this option to get a CPU Roofline with callstacks.
--enable-cache-simulation
Model CPU cache behavior on your target application. Use this option to get a Memory-level CPU Roofline that shows data for all memory levels.
--cache-config
=
<config>
Set the cache hierarchy to collect modeling data for CPU cache behavior. Use with
enable-cache-simulation
.
The value should follow the template: [
<num_of_caches>
]:[
<num_of_ways_caches_connected>
]:[
<cache_size>
]:[
<cacheline_size>
] for each of three cache levels separated with a
/
.
--cachesim-associativity
=
<num>
Set the cache associativity for modeling CPU cache behavior: 1 | 2 | 4 | 8 (default) | 16. Use with
enable-cache-simulation
.
--cachesim-mode
=
<mode>
Set the focus for modeling CPU cache behavior:
cache-misses
|
footprint
|
utilization
. Use with
enable-cache-simulation
.
See advisor Command Option Reference for more options.
Memory Access Patterns Options
The Memory Access Patterns analysis is
optional
because it adds a high overhead. This analysis does not add more information to the CPU Roofline chart. The results are added to the Refinement report, which you can view from GUI or from CLI. Use it to understand the Memory-Level Roofline chart better and get more detailed optimization recommendations.
To run the Memory Access Patterns analysis, use the following command line action:
--collect=map
.
Recommended action options:
Options
Description
--select=
<string>
Select loops for the analysis by loop IDs, source locations, or criteria such as
scalar
,
has-issue
, or
markup=
<markup-mode>
. This option is required.
See select for more selection options.
--enable-cache-simulation
Model CPU cache behavior on your target application.
--cachesim-cacheline-size
=
<num>
Set the cache line size (in bytes) for modeling CPU cache behavior: 4 | 8 | 16 | 32 | 64 (default) | 128 | 256 | 512 | 1024 | 2048 | 4096 | 8192 | 16384 | 32768 | 65536. Use with
enable-cache-simulation
.
--cachesim-sets
=
<num>
Set the cache set size (in bytes) for modeling CPU cache behavior: 256 | 512 | 1024 | 2048 | 4096 (default) | 8192. Use with
enable-cache-simulation
.
See advisor Command Option Reference for more options.
Dependencies Options
The Dependencies analysis is
optional
because it adds a high overhead and is mostly necessary if you have scalar loops/functions in your application. This analysis does not add more information to the CPU Roofline chart. The results are added to the Refinement report, which you can view from GUI or from CLI. Use it to get more detailed optimization recommendations.
To run the Dependencies analysis, use the following command line action:
--collect=dependencies
.
Recommended action options:
Options
Description
--select=
<string>
Select loops for the analysis by loop IDs, source locations, criteria such as
scalar
,
has-issue
, or
markup=
<markup-mode>
. This option is required.
See select for more selection options.
--filter-reductions
Mark all potential reductions with a specific diagnostic.
See advisor Command Option Reference for more options.

View the Results

Intel Advisor
provides several ways to work with the
CPU / Memory Roofline Insights
results.
View Results in GUI
When you run
Intel Advisor
CLI, a project is created automatically in the directory specified with
--project-dir
. All the collected results and analysis configurations are stored in the
.advixeproj
project, which you can view in the
Intel Advisor
.
To open the project in GUI, run the following command:
advisor-gui <project-dir>
If the report does not open, click
Show Result
on the Welcome pane.
You will see the CPU Roofline report that includes:
  • Roofline chart that plots an application's achieved performance and arithmetic intensity against the CPU maximum achievable performance
  • Additional information about your application in the
    Advanced View
    pane under the chart, including source code, detailed code analytics for trip counts and FLOP/INTOP data, optimization recommendations, and compiler diagnostics
    Select a dot on the Roofline chart to see details for the selected loop in all tabs of the
    Advanced View
    pane
CPU Roofline report
View an Interactive HTML Report
Intel Advisor
enables you to export an interactive HTML report for the CPU Roofline chart, which you can open in your preferred browser and share.
When you open the report, you see the CPU Roofline chart with the selected configuration. In this report, you can:
  • Expand the
    Performance Metrics Summary
    drop-down to view the summary performance characteristics for your application.
  • Double-click a dot on the chart to see a roof ruler that point to exact roofs that bound the dot.
  • Hover over a dot to see a detailed tooltip with performance metrics.
If you have a Memory-level Roofline report, you can also:
  • Select memory levels to show dots for from the filter drop-down list on the chart.
  • Double-click a dot on the chart to expand it for other memory levels and see roof rulers.
CPU Roofline HTML report
For details on exporting HTML reports, see Work with Standalone HTML Reports.
Save a Read-only Snapshot
A snapshot is a read-only copy of a project result, which you can view at any time using the
Intel Advisor
GUI. To save an active project result as a read-only snapshot:
advisor --snapshot --project-dir=
<project-dir>
[--cache-sources] [--cache-binaries] --
<snapshot-path>
where:
  • --cache-sources
    is an option to add application source code to the snapshot.
  • --cache-binaries
    is an option to add application binaries to the snapshot.
  • <snapshot-path
    is a path and a name for the snapshot. For example, if you specify
    /tmp/new_snapshot
    , a snapshot is saved in a
    tmp
    directory as
    new_snapshot.advixeexpz
    . You can skip this and save the snapshot to a current directory as
    snapshot
    XXX
    .advixeexpz
    .
To open the result snapshot in the
Intel Advisor
GUI, you can run the following command:
advisor-gui
<snapshot-path>
You can visually compare the saved snapshot against the current active result or other snapshot results.

Next Steps

These sections are GUI-focused, but you can still use them to understand the results. For details about the metrics reported, see CPU Metrics.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.