Intel® Advisor User Guide

ID 766448
Date 11/07/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Run Offload Modeling Perspective from Command Line

Intel® Advisor provides several methods to run the Offload Modeling perspective from command line. Use one of the following:

  • Method 1. Run Offload Modeling with a command line collection presets. Use this method if you want to use basic Intel Advisor analysis and modeling functionality, especially for the first-run analysis. This simple method allows you to run multiple analyses with a single command and control the modeling accuracy.
  • Method 2. Run Offload Modeling analyses separately. Use this method if you want to analyze an MPI application or need more advanced analysis customization. This method allows you to select what performance data you want to collect for your application and configure each analysis separately.
  • Method 3. Run Offload Modeling with Python* scripts. Use this method if you need more analysis customization. This method is moderately flexible and allows you to customize data collection and performance modeling steps.
TIP:
See Intel Advisor cheat sheet for quick reference on command line interface.

After you run the Offload Modeling with any method above, you can view the results in Intel Advisor graphical user interface (GUI), command line interface (CLI), or an interactive HTML report. For example, the interactive HTML report is similar to the following:

Prerequisites

  1. Set Intel Advisor environment variables with an automated script.

    The script enables the advisor command line interface (CLI), advisor-python command line tool, and the APM environment variable, which points to the directory with Offload Modeling scripts and simplifies their use.

  2. For SYCL, OpenMP* target, OpenCL™ applications: Set Intel Advisor environment variables to offload temporarily your application to a CPU for the analysis.
    NOTE:
    You are recommended to run the GPU-to-GPU performance modeling to analyze SYCL, OpenMP target, and OpenCL application because it provides more accurate estimations.

Optional: Generate pre-Configured Command Lines

With the Intel Advisor, you can generate pre-configured command lines for your application and hardware. Use this feature if you want to:

  • Analyze an MPI application
  • Customize pre-set Offload Modeling commands

Offload Modeling perspective consists of multiple analysis steps executed for the same application and project. You can configure each step from scratch or use pre-configured command lines that do not require you to provide the paths to project directory and an application executable manually.

Option 1. Generate pre-configured command lines with --collect=offload and the --dry-run option. The option generates:

  • Commands for the Intel Advisor CLI collection workflow
  • Commands that correspond to a specified accuracy level
  • Commands not configured to analyze an MPI application. You need to manually adjust the commands for MPI.

Note: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.

The workflow includes the following steps:

  1. Generate the command using the --collect=offload action with the --dry-run option. Specify accuracy level and paths to your project directory and application executable.

    For example, to generate the low-accuracy commands for the myApplication application, run the following command:

    • On Linux* OS:
      advisor --collect=offload --accuracy=low --dry-run --project-dir=./advi_results -- ./myApplication
    • On Windows* OS:
      advisor --collect=offload --accuracy=low --dry-run --project-dir=.\advi_results -- .\myApplication.exe

    You should see a list of commands for each analysis step to get the Offload Modeling result with the specified accuracy level (for the commands above, it is low).

  2. If you analyze an MPI application: Copy the generated commands to your preferred text editor and modify each command to use an MPI tool. For details about the syntax, see Analyze MPI Applications.
  3. Run the generated commands one by one from a command prompt or a terminal.

Option 2. If you have an Intel Advisor graphical user interface (GUI) available on your system and you want to analyze an MPI application from command line, you can generate the pre-configured command lines from GUI.

The GUI generates:

  • Commands for the Intel Advisor CLI collection workflow
  • Commands for a selected accuracy level if you want to run a pre-defined accuracy level or commands for a custom project configuration if you want to enable/disable additional analysis options
  • Command configured for MPI application with Intel® MPI Library. You do not need to manually modify the commands for the MPI application syntax.

For detailed instructions, see Generate Pre-configured Command Lines.

Method 1. Use Collection Presets

For the Offload Modeling perspective, Intel Advisor has a special collection mode --collect=offload that allows you to run several analyses using only oneIntel Advisor CLI command. When you run the collection, it sequentially runs data collection and performance modeling steps. The specific analyses and options depend on the accuracy level you specify for the collection.

Note: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.

For example, to run the Offload Modeling perspective with the default (medium) accuracy level:

  • On Linux* OS:
    advisor --collect=offload --project-dir=./advi_results -- ./myApplication 
  • On Windows* OS:
    advisor --collect=offload --project-dir=.\advi_results -- .\myApplication.exe

The collection progress and commands for each analysis executed will be printed to a terminal or a command prompt. By default, the performance is modeled for the Intel® Arc™ graphics code-named Alchemist (xehpg_512xve configuration). When the collection is finished, you will see the result summary.

Analysis Details

To change the analyses to run and their option, you can specify a different accuracy level with the --accuracy=<level> option. The default accuracy level is medium.

The following accuracy levels are available:

  • low accuracy includes Survey, Characterization with Trip Counts and FLOP collections, and Performance Modeling analyses.
  • medium (default) accuracy includes Survey, Characterization with Trip Counts and FLOP collections, cache and data transfer simulation, and Performance Modeling analyses.
  • high accuracy includes Survey, Characterization with Trip Counts and FLOP collections, cache, data transfer, and memory object attribution simulation, and Performance Modeling analyses. For CPU applications, also includes the Dependencies analysis.

    For CPU applications, this accuracy level adds a high collection overhead because it includes the Dependencies analysis. This analysis is not required if your application is highly parallelized or vectorized on a CPU or if you know that key hotspots in your application do not have loop-carried dependencies. Otherwise, to learn how dependencies might affect your application performance on a GPU, see How Assumed Dependencies Affect Modeling?>.

For example, to run the low accuracy level:

advisor --collect=offload --accuracy=low --project-dir=./advi_results -- ./myApplication

To run the high accuracy level:

advisor --collect=offload --accuracy=high --project-dir=./advi_results -- ./myApplication

If you want to see the commands that are executed at each accuracy level, you can run the collection with the --dry-run option. The commands will be printed to a terminal or a command prompt.

For details about each accuracy level, see Offload Modeling Accuracy Levels in Command Line.

Customize Collection

You can also specify additional options if you want to run the Offload Modeling with custom configuration. This collection accepts most options of the Performance Modeling analysis (--collect=projection) and some options of the Survey, Trip Counts, and Dependencies analyses that can be useful for the Offload Modeling.

IMPORTANT:
Make sure to specify the additional options after the --accuracy option to make sure they take precedence over the accuracy level configurations.

Consider the following action options:

Option

Description

--accuracy=<level>

Set an accuracy level for a collection preset. Available accuracy levels:

  • low
  • medium (default)
  • high

For details, see Offload Modeling Accuracy Levels in Command Line.

--config

Select a target GPU configuration to model performance for. For example, xehpg_512xve (default), gen12_dg1, or gen9_gt3.

See conig for a full list of possible values and mapping to device names.

--gpu

Analyze a SYCL, OpenCL™, or OpenMP* target application on a graphics processing unit (GPU) device. This option automatically adds all related options to each analysis included in the preset.

If you use this option, the high accuracy does not include the Dependencies analysis.

For details about this workflow, see Run GPU-to-GPU Performance Modeling from Command Line.

--data-reuse-analysis

Analyze potential data reuse between code regions. This option automatically adds all related options to each analysis included in the preset.

--enforce-fallback

Emulate data distribution over stacks if stacks collection is disabled. This option automatically adds all related options to each analysis included in the preset.

For details about other available options, see collect.

Method 2. Use per-Analysis Collection

You can collect data and model performance for your application by running each Offload Modeling analysis in a separate command using Intel Advisor CLI. This option allows you to:

  • Control what analyses you want to run to profile your application and what data you want to collect
  • Modify behavior of each analysis you run with an extensive set of options
  • Remodel application performance without re-collecting performance data. This can save time if you want to see how the performance of your application might change with different modeling parameters using the same performance data as baseline
  • Profile and model performance of an MPI application

Consider the following workflow example. Using this example, you can run the Survey, Trip Counts, and FLOP analyses to profile an application and the Performance Modeling to model its performance on a selected target device.

Note: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.

On Linux OS:

  1. Run the Survey analysis.
    advisor --collect=survey --static-instruction-mix --project-dir=./advi_results -- ./myApplication
  2. Run the Trip Counts and FLOP analyses with data transfer simulation for the default Intel® Arc™ graphics code-named Alchemist (xehpg_512xve configuration).
    advisor --collect=tripcounts --flop --stacks --cache-simulation=single --target-device=xehpg_512xve --data-transfer=light --project-dir=./advi_results -- ./myApplication
  3. Optional: Run the Dependencies analysis to check for loop-carried dependencies.
    advisor -collect=dependencies --loop-call-count-limit=16 --select markup=gpu_generic --filter-reductions --project-dir=./advi_results -- ./myApplication

    The Dependencies analysis adds a high collection overhead. You can skip it if your application is highly parallelized or vectorized on a CPU or if you know that key hotspots in your application do not have loop-carried dependencies.. If you are not sure, see Check How Assumed Dependencies Affect Modeling to learn how dependencies affect your application performance on a GPU.

  4. Run the Performance Modeling analysis to model application performance on the default Intel® Arc™ graphics code-named Alchemist(xehpg_512xve configuration).
    advisor --collect=projection --project-dir=./advi_results

    You will see the result summary printed to the command prompt.

    Tip: If you already have an analysis result saved as a snapshot or a result for an MPI rank, you can use the exp-dir option instead of project-dir to model performance for the result.

On Windows OS:

  1. Run the Survey analysis.
    advisor --collect=survey --static-instruction-mix --project-dir=.\advi_results -- .\myApplication.exe
  2. Run the Trip Counts and FLOP analyses with cache simulation for the default Intel® Arc™ graphics code-named Alchemist (xehpg_512xve configuration).
    advisor --collect=tripcounts --flop --stacks --cache-simulation=single --target-device=xehpg_512xve --data-transfer=light --project-dir=.\advi_results -- .\myApplication.exe
  3. Optional: Run the Dependencies analysis to check for loop-carried dependencies.
    advisor -collect=dependencies --loop-call-count-limit=16 --select markup=gpu_generic --filter-reductions --project-dir=.\advi_results -- myApplication.exe

    The Dependencies analysis adds a high collection overhead. You can skip it if your application is highly parallelized or vectorized on a CPU or if you know that key hotspots in your application do not have loop-carried dependencies.. If you are not sure, see Check How Assumed Dependencies Affect Modeling to learn how dependencies affect your application performance on a GPU.

  4. Run the Performance Modeling analysis to model application performance on the default Intel® Arc™ graphics code-named Alchemist (xehpg_512xve configuration).
    advisor --collect=projection --project-dir=.\advi_results

    You will see the result summary printed to the command prompt.

    Tip: If you already have a collected analysis result saved as a snapshot or result for an MPI rank, you can use the exp-dir option instead of project-dir to model performance for the result.

For more useful options, see the Analysis Details section below.

Analysis Details

The Offload Modeling workflow includes the following analyses:

  1. Survey to collect initial performance data.
  2. Characterization with trip counts and FLOP to collect performance details.
  3. Dependencies (optional) to identify loop-carried dependencies that might limit offloading.
  4. Performance Modeling to model performance on a selected target device.

Each analysis has a set of additional options that modify its behavior and collect additional p