Intel® Advisor User Guide

ID 766448
Date 6/24/2024
Public
Document Table of Contents

Run CPU / Memory Roofline Insights Perspective from Command Line

To plot a Roofline chart, the Intel® Advisor does the following:

  1. Collect OpenCL™ kernels timings and memory data using the Survey analysis with GPU profiling.
  2. Measure the hardware limitations and collect floating-point and integer operations data using the Characterization analysis with GPU profiling.

    Intel® Advisor calculates compute operations (FLOP and INTOP) as a weighted sum of the following groups of instructions: BASIC COMPUTE, FMA, BIT, DIV, POW, MATH.

    Intel Advisor automatically determines data type in the collected operations using the dst register.

Prerequisites

Set Intel Advisor environment variables with an automated script to enable the advisor command line interface (CLI).

Plot a CPU Roofline Chart

There are two methods to run the CPU Roofline. Use one of the following:

  • Run the shortcut --collect=roofline command line action to execute the Survey and Characterization analyses with a single command. This method is recommended to run the CPU / Memory Roofline Insights perspective, but it does not support MPI applications.
  • Run the Survey and Characterization analyses with the --collect=survey and --collect=tripcounts command actions separately one by one. This method is recommended if you want to analyze an MPI application.

Note: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.

Method 1. Run the Shortcut Command

To collect data for a CPU Roofline chart with a shortcut, run the following command:

advisor --collect=roofline --project-dir=./advi_results -- ./myApplication

This command collects data for a basic CPU Roofline chart based on the Cache-Aware Roofline model. You can add other option to the command to collect more data. See Analysis Details below for more options.

Method 2. Run the Analyses Separately

Use this method if you want to analyze an MPI application.

  1. Run the Survey analysis.
    advisor --collect=survey --project-dir=./advi_results -- ./myApplication
  2. Run the Characterization analysis to collect trip counts and FLOP data:
    advisor --collect=tripcounts --flop --project-dir=./advi_results -- ./myApplication

These commands collect data for a basic CPU Roofline chart based on the Cache-Aware Roofline model. You can add other option to the command to collect more data. See Analysis Details below for more options.

You can view the results in the Intel Advisor graphical user interface (GUI), or generate an interactive HTML report. See View the Results below for details.

Analysis Details

The CPU / Memory Roofline Insights workflow includes the following analyses:

  1. Roofline to plot a Roofline chart. This step sequentially runs the Survey and Characterization (trip counts and FLOP) analyses.
  2. Memory Access Patterns (optional) to identify memory traffic data and memory usage issues.
  3. Dependencies (optional) to identify loop-carried dependencies that might limit offloading.

Each analysis has a set of additional options that modify its behavior and collect additional performance data. The more analyses you run and option you use, the more useful data about your application you get.

Consider the following options:

Roofline Options

To run the Roofline analysis, use the following command line action: --collect=roofline.

NOTE:
You can also use this options with --collect=tripcounts if you want to run the analyses separately.

Recommended action options:

Options

Description

--stacks

Enable advanced collection of call stack data. Use this option to get a CPU Roofline with callstacks.

--enable-cache-simulation

Model CPU cache behavior on your target application. Use this option to get a Memory-level CPU Roofline that shows data for all memory levels.

--cache-config=<config>

Set the cache hierarchy to collect modeling data for CPU cache behavior. Use with enable-cache-simulation.

The value should follow the template: [<num_of_caches>]:[<num_of_ways_caches_connected> ]:[<cache_size>]:[<cacheline_size>] for each of three cache levels separated with a /.

--cachesim-associativity=<num>

Set the cache associativity for modeling CPU cache behavior: 1 | 2 | 4 | 8 (default) | 16. Use with enable-cache-simulation.

--cachesim-mode=<mode>

Set the focus for modeling CPU cache behavior: cache-misses | footprint | utilization. Use with enable-cache-simulation.

See advisor Command Option Reference for more options.

Memory Access Patterns Options

The Memory Access Patterns analysis is optional because it adds a high overhead. This analysis does not add more information to the CPU Roofline chart. The results are added to the Refinement report, which you can view from GUI or from CLI. Use it to understand the Memory-Level Roofline chart better and get more detailed optimization recommendations.

To run the Memory Access Patterns analysis, use the following command line action: --collect=map.

Recommended action options:

Options

Description

--select=<string>

Select loops for the analysis by loop IDs, source locations, or criteria such as scalar, has-issue, or markup=<markup-mode>. This option is required.

See select for more selection options.

--enable-cache-simulation

Model CPU cache behavior on your target application.

--cachesim-cacheline-size=<num>

Set the cache line size (in bytes) for modeling CPU cache behavior: 4 | 8 | 16 | 32 | 64 (default) | 128 | 256 | 512 | 1024 | 2048 | 4096 | 8192 | 16384 | 32768 | 65536. Use with enable-cache-simulation.

--cachesim-sets=<num>

Set the cache set size (in bytes) for modeling CPU cache behavior: 256 | 512 | 1024 | 2048 | 4096 (default) | 8192. Use with enable-cache-simulation.

See advisor Command Option Reference for more options.

Dependencies Options

The Dependencies analysis is optional because it adds a high overhead and is mostly necessary if you have scalar loops/functions in your application. This analysis does not add more information to the CPU Roofline chart. The results are added to the Refinement report, which you can view from GUI or from CLI. Use it to get more detailed optimization recommendations.

To run the Dependencies analysis, use the following command line action: --collect=dependencies.

Recommended action options:

Options

Description

--select=<string>

Select loops for the analysis by loop IDs, source locations, criteria such as scalar, has-issue, or markup=<markup-mode>. This option is required.

See select for more selection options.

--filter-reductions

Mark all potential reductions with a specific diagnostic.

See advisor Command Option Reference for more options.

Next Steps

Continue to explore the CPU / Memory Roofline Insights results with a preferred method. For details about the metrics reported, see CPU and Memory Metrics.