Cookbook

  • 2021.3
  • 09/23/2021
  • Public Content

Performance Optimization for
Intel® Processor Graphics

Content expert: Eugene Krasichkov
Performance Optimization for
Intel® Processor Graphics
is a series of recipes to help you determine and optimize performance bottlenecks in graphics applications.

Methodology

Use the series of recipes to learn how to use the
Intel® GPA
Graphics Frame Analyzer
on
Intel® Processor Graphics
to profile your code efficiently and to find bottlenecks in the graphics pipeline.

Ingredients

To optimize performance of graphics applications on
Intel® Processor Graphics
with
Intel® GPA
, you need the following:
  • Tool:
    Intel® Graphics Performance Analyzers
    To download a free copy of the
    Intel® Graphics Performance Analyzers
    toolkit, visit the
    Intel® GPA
    product page
    .
  • Operating System:
    Windows*, macOS*, Ubuntu*
  • GPU:
    Intel® Processor Graphics
    Gen6 - Gen11
  • API:
    DirectX* 9 - 12, Vulkan*, OpenGL*

How to Start Analysis

To get started with your analysis:
  1. Launch the
    Intel® GPA
    Graphics Monitor
    on your target system.
  2. Capture a sample frame or stream (for Vulkan) from your game with the
    Intel® GPA
    Heads-Up Display (HUD).
    It is recommended to analyze performance with the latest driver and version of
    Intel® GPA
    .
  3. Open the captured frame with the
    Intel® GPA
    Graphics Frame Analyzer
    .
    For Vulkan, open the captured stream in the Multiframe View, and then select a frame to open with
    Intel® GPA
    Graphics Frame Analyzer
    .
  4. Click the button to enable the Hotspot mode, and then select any event or group of events for further analysis.
In the normal mode you can manually select one event or a contiguous range of events. To properly observe graphics architecture, the selected events should meet the following conditions:
  • Total cycle count of all selected events is ≥ 20,000.
    Check the GPU Core Clocks, cycles metric.
  • There are no state changes between the events, such as shader changes, pipeline state, and so on Texture and constant changes are exempt from this rule, unless the texture is a dynamically-generated surface.
  • Events share the same render, depth, and stencil surface. This is not an explicit check in
    Intel® GPA
    .
If you select a set of events that do not meet the above conditions, these events will be considered filtered events, and the analysis will not be conducted. When using metrics analysis techniques like this, do not have any state change within the selection. For example, if you measure two draw calls where one has a depth attachment and the other does not, any potential hotspot associated with depth would be averaged out over the two draw calls—effectively diluting the results.

How
Graphics Frame Analyzer
Identifies Bottlenecks Using Hardware Metrics

Once the selection is made,
Intel® GPA
Graphics Frame Analyzer
playbacks the frame on your GPU, collects performance data, and highlights graphics architectural blocks with bottlenecks.
Green underline means that the bottleneck criteria are not met, and that this part of the pipeline is not the bottleneck. Red means that this part of the GPU pipeline is the primary bottleneck. Yellow means that the node is not a primary bottleneck, but does have performance optimization opportunities.
Each of the metrics blocks in the
Intel® GPA
Graphics Frame Analyzer
Metrics pane is mapped based on the graphics processing unit workflows.
Intel® Processor Graphics
performs deeply pipelined parallel execution of the front-end work and the back-end work within a single event. The front-end work includes geometry transformation, rasterization, early depth/stencil, etc. The back-end work includes pixel shading, sampling, color write, blend, and late depth/stencil. Due to the deeply pipelined execution, hotspots from downstream architectural blocks bubble up and stall upstream blocks. This can make it difficult to find the actual hotspot.
To find the primary hotspot using the metrics,
Intel® GPA
walks the pipeline in reverse order.
Intel® GPA
follows two separate workflows for 3D and general-purpose computing designed on graphics processing units (GPGPU).
Workflow for 3D workloads:
Workflow for 3D workloads.
Workflow for compute workloads:
Workflow for compute workloads.
Green nodes within the flowcharts represent potential bottlenecks within the GPU. At each node
Intel® GPA
asks, whether the bottleneck is primary. If yes, the bottleneck for the particular selection is found. If no,
Intel® GPA
continues to the next node in the flowchart. Blue nodes branch the decision path and grey nodes represent terminal hotspots.
For more information about sampler and shader execution hotspots, read the following sections: Optimize Sampler, Optimize Shader Execution.
For more information about other hotspots, refer to the user guide.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.