Cookbook

  • 2022
  • 04/11/2022
  • Public Content

Model GPU Application Performance for a Different GPU Device

This recipe illustrates how to estimate application performance from one Intel® graphics processing unit (GPU) architecture to another by running the
Offload Modeling
perspective from the
Intel® Advisor
.
The performance estimation plays an important role in determining the next steps for the future-generation GPU architectures. For such cases, the GPU-to-GPU modeling is more accurate than the CPU-to-GPU modeling because of inherent differences between CPU and GPU execution flows.
In this recipe, use the
Intel Advisor
to analyze performance a DPC++ application with the GPU-to-GPU modeling flow of the
Offload Modeling
perspective to estimate the profitability of offloading the application to the Intel® Iris® X
e
MAX graphics (
gen12_dg1
configuration).
Directions:

Ingredients

This section lists the hardware and software used to produce the specific result shown in this recipe:
  • Performance analysis tools
    :
    Intel Advisor
    2021
    Available for download as a standalone and as part of the Intel® oneAPI Base Toolkit.
  • Application
    : DPC++ implementation of the Mandelbrot sample application, which is part of oneAPI samples
  • Compiler
    : Intel® oneAPI DPC++/C++ Compiler 2021
    Available for download as part of the Intel® oneAPI Base Toolkit.
  • Operating system
    : Ubuntu* 20.04
  • Baseline GPU
    : Intel® Iris® Plus Graphics 655
You can download a precollected Offload Modeling report for the Mandelbrot application to follow this recipe and examine the analysis results.

Prerequisites

  1. Set up environment variables for oneAPI tools:
    source <oneapi-install-dir>/setvars.sh
  2. Configure your system to analyze GPU kernels.
  3. Build the DPC++ version of the Mandelbrot application:
    cd mandelbrot/ && mkdir build && cd build && cmake .. && make

Run GPU-to-GPU Performance Modeling

You can run the GPU-to-GPU modeling using
Intel Advisor
command line interface (CLI), Python* scripts, or
Intel Advisor
graphical user interface (GUI).
In this section, use a special command line collection preset for the
Offload Modeling
perspective with the
--gpu
option to run all perspective analyses for the GPU-to-GPU modeling with a single command:
advisor --collect=offload --project-dir=./mandelbrot-advisor --gpu --config=gen12_dg1 -- ./mandelbrot
You can change a target GPU for modeling by providing a different value to the
--config
option. See config for details and a full list of options.
This command runs the perspective with the default
medium
accuracy and runs the following analyses one-by-one:
  1. Survey analysis to collect baseline performance data
  2. Characterization analysis to collect trip counts and FLOP and model data transfers
  3. Performance Modeling from the baseline Intel® UHD Graphics P630 device to the target Intel® Iris® X
    e
    MAX Graphics
Important
: The command line collection preset does not support MPI applications. You will need to run the analyses separately to analyze MPI application.
Once the analyses are completed, the result summary is printed to the terminal. You can continue to view the results in the
Intel Advisor
GUI or in an interactive HTML report from your preferred web browser.

Examine Performance Speedup on the Target GPU

In this section, examine the HTML report to understand the GPU-to-GPU modeling results. The HTML report is generated automatically after you run the
Offload Modeling
from CLI or using the Python scripts and is saved to
./mandelbrot-advisor/e000/report/advisor-report.html
. You can open the report in your preferred web browser.
In this interactive HTML report, you can switch between
Offload Modeling
and GPU Roofline Insights perspective results using the drop-down in the top left.
In the Summary tab, examine the
Top Metrics
and
Program Metrics
panes to understand the performance gain.
  • The
    Top Metrics
    pane shows an average speed up of 5.311x from offloading one code region of the Mandelbrot application from the baseline Intel® Iris® Plus Graphics 655 GPU device to the target Intel® Iris® X
    e
    MAX Graphics GPU device.
  • The
    Program Metrics
    shows measured execution time for the current run on the baseline GPU and an estimated time for the run on the target GPU.
    GPU-to-GPU Modeling: Summary report
You can navigate between
Summary
,
Accelerated Regions
, and
Source View
tabs to understand details about the offloaded regions, examine useful metrics and the potential performance gain.
The
Accelerated Regions
tab provides detailed information for the offloaded code regions along with the source code in the bottom pane. In this view, you can examine different useful metrics for offloaded regions of interest. For example, examine the following metrics
measured
for the kernels running on the baseline GPU: iteration space, thread occupancy, SIMD width, local size, global size.
Examine the following metrics
estimated
for the target GPU: performance issues, time, speedup, data transfer with reuse.
See Accelerator Metrics for detailed description and interpretation of these metrics.
GPU-to-GPU modeling: Accelerated Regions report

Alternative Steps

You can run the GPU-to-GPU modeling using
Intel Advisor
command line interface (CLI), Python* scripts, or
Intel Advisor
GUI.
Run
Intel Advisor
Python Scripts (Instead of
Offload Modeling
Collection Preset)
Use the special Python scripts delivered with the
Intel Advisor
to run the GPU-to-GPU modeling. These scripts use the
Intel Advisor
Python API to run the analyses.
For example, run the
run_oa.py
script with the
--gpu
to execute the perspective using a single command as follows:
$ advisor-python $APM/run_oa.py ./mandelbrot-advisor --collect=basic --gpu --config=gen12_dg1 -- ./mandlebrot
You can change a target GPU for modeling by providing a different value to the
--config
option. See config for a full list of options.
The
run_oa.py
script runs the following analyses one-by-one:
  1. Survey analysis to collect baseline performance data
  2. Characterization analysis to collect trip counts and FLOP and model data transfers
  3. Performance Modeling from the baseline Intel® UHD Graphics P630 device to the target Intel® Iris® X
    e
    MAX Graphics
Important
: The command line collection preset does not support MPI applications. Use the
Intel Advisor
CLI to analyze MPI application.
Once the analyses are completed, the result summary is printed to the terminal. You can continue to view the results in the
Intel Advisor
GUI or in an interactive HTML report from your preferred web browser.
Run
Intel Advisor
GUI (Instead of
Offload Modeling
Collection Preset)
Prerequisite
: Create a project for the Mandelbrot application.
To run GPU-to-GPU modeling from
Intel Advisor
GUI:
  1. From the
    Perspective Selector
    window, select the
    Offload Modeling
    perspective.
  2. In the
    Analysis Workflow
    pane, select the following:
    1. Select
      GPU
      from the
      Baseline Device
      drop-down.
    2. Select
      Xe LP Max
      from the
      Target Platform Model
      drop-down.
      GPU-to-GPU Modeling: Configuration in the Analysis Workflow
    3. Run the perspective.
Once the perspective is completed, the GPU-to-GPU offload modeling result is shown in the pane on the right.

Key Take-Aways

With the GPU-to-GPU modeling, you can get more accurate projections for your application performance on the next-generation GPUs even before you have the hardware. The metrics collected by
Offload Modeling
can help you understand performance of the kernels running on the baseline GPU. The new interactive HTML report gives GUI-like experience and allows you to switch between
Offload Modeling
and GPU Roofline Insights perspectives, almost as in GUI.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.