Intel® Advisor User Guide

ID 766448
Date 3/31/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Run GPU-to-GPU Performance Modeling from Command Line

With Intel® Advisor, you can model performance of SYCL, OpenCL™, or OpenMP* target application running on a graphics processing unit (GPU) for a different GPU device without its CPU version. For this, run the GPU-to-GPU modeling workflow of the Offload Modeling perspective.

The GPU-to-GPU modeling analyzes only GPU compute kernels and ignores the application parts executed on a CPU. As a result, there are several changes in the modeling flow:

  • Compute kernels characteristics are collected with the Intel Advisor GPU profiling capabilities.
  • High-overhead features, such as call stack handling, cache simulation, data transfer simulation, dependencies analysis, are disabled.
  • Instead of CPU-to-GPU data transfer simulation, memory objects transferred between host and device memory are traced.

Workflow

The GPU-to-GPU performance modeling workflow is similar to the CPU-to-GPU modeling and includes the following steps:

  1. Survey analysis measures execution time, cache, and GTI traffic using hardware counters for GPU-enabled kernels running on an Intel® Graphics.
  2. Characterization analysis measures the number of compute operations counting different GPU instructions separately for kernels running on a Intel Graphics. For example, it implements separate counters for hardware-implemented 32-bit math functions, such as SQRT, EXP, DIV.
  3. Performance Modeling analysis models performance of all kernels on a target GPU device, whether they are profitable or not.
NOTE:
For correct memory object tracing, GPU kernels should run with the oneAPI Level Zero back end.

Prerequisites

  1. Configure your system to analyze GPU kernels.
  2. Set Intel Advisor environment variables with an automated script to enable Intel Advisor command line interface.

Run the GPU-to-GPU Performance Modeling

To run the GPU-to-GPU performance modeling from command line, you can use one of the following methods:

  • Method 1. Run a collection preset using Intel Advisor command line interface (CLI) to execute multiple analyses with a single command and control modeling accuracy.
  • Method 2. Run analyses separately using Intel Advisor CLI if you need more advanced customization for each analysis.
  • Method 3. Run Python* scripts if you need more customization for collection and modeling steps.

You can also run the GPU-to-GPU Offload Modeling from Intel Advisor graphical user interface (GUI). See Run Offload Modeling Perspective from GUI.

After you run the Offload Modeling with any method above, you can view the results in Intel Advisor graphical user interface (GUI), command line interface (CLI), or an interactive HTML report. For example, the interactive HTML report is similar to the following:

TIP:
If you want to analyze an MPI application, you can generate pre-configured command lines, copy them, and run one by one. For details, see Generate Pre-configured Command Lines.

Method 1. Use Collection Preset

To run the collection preset for the GPU-to-GPU modeling, use the --gpu option with the --collect=offload action. When you run the collection, it sequentially runs data collection on a GPU and performance modeling steps. The specific analyses and options depend on the accuracy level you specify for the collection.

Note: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.

For example, to run the GPU-to-GPU modeling with the default (medium) accuracy level:

  • On Linux* OS:
    advisor --collect=offload --gpu --project-dir=./advi_results -- ./myApplication
  • On Windows* OS:
    advisor --collect=offload --gpu --project-dir=.\advi_results -- myApplication.exe

The collection progress and commands for each analysis executed will be printed to a terminal or a command prompt. When the collection is finished, you will see the result summary.

You can also specify a different accuracy level to change analyses to run and their options. Available accuracy levels are low, medium (default), and high.

For example, run the high accuracy level:

advisor --collect=offload --accuracy=high --gpu --project-dir=./advi_results -- ./myApplication

If you want to see the commands that are executed at each accuracy level, you can run the collection with the --dry-run and --gpu options. The commands will be printed to a terminal or a command prompt.

For details about each accuracy level, see Offload Modeling Accuracy Levels in Command Line.

Method 2. Use per-Analysis Collection

You can collect data and model performance for your application by running each Offload Modeling analysis in a separate command for more advanced customization. To enable the GPU-to-GPU modeling, use the --profile-gpu option for each analysis you run.

Consider the following workflow example. Using this example, you can run the Survey, Trip Counts, and FLOP analyses to profile an application running on a GPU and the Performance Modeling to model its performance on Intel® Iris® Xe MAX graphics (gen12_dg1 configuration).

Note: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.

On Linux O