Intel® Graphics Performance Analyzers (Intel® GPA) offers a wide range of tools to analyze graphics applications across diverse workloads, platforms, and graphics APIs. Graphics capabilities are continually expanding allowing for smooth, realistic gameplay on high end hardware, but what about for lower end hardware? That’s where Intel® GPA can aid in optimization efforts, may you be an artist looking to improve the rendering time of your scene, a game developer looking to maximize the efficiency of your shaders, or even a student wanting to learn about graphics optimization to strengthen your portfolio.
This project will be using the Intel® GPA sample that is packaged in by default after downloading and installing Intel® GPA on your system. This sample will be used to locate bottlenecks using several Intel® GPA tools and then discuss how you would go about resolving these bottlenecks in your application, so it runs more efficiently on lower end hardware. To visualize topics covered in this project view the GIFs correlated with each topic header.
In order to follow along with this project, ensure you have the minimum system requirements to successfully analyze graphics applications using Intel® GPA. Also make sure you have downloaded and installed Intel® GPA on your host platform.
Where Do I Begin?
To begin optimization, you must first determine where optimization efforts are best spent. Is it the CPU or the GPU that’s limiting the performance of this game? This can easily be determined by capturing a trace of your game in Graphics Monitor and then by opening the trace file in Graphics Trace Analyzer. To capture a trace of the GPA sample application, type in or select the GPA sample executable in the application executable bar, then select the Trace option from the drop down menu next to the Start button and press the start button to launch the application. To capture a trace press CTRL-SHIFT-T and an image will pop up in Graphics Monitor that you can double click on to open the trace file in Graphics Trace Analyzer. While inspecting the trace file in Graphics Trace Analyzer zoom into the timeline of the trace file by hovering over the segment you want to inspect further, then press the CTRL key on your keyboard and either press the W key on your keyboard or the mouse scroll wheel to zoom in.
While inspecting the trace file that was captured earlier, if there is no noticeable gaping in both the 3D Queue and the Device Context Queues then it is likely that the application is GPU bound. If there is noticeable gapping in both tracks, then your application is likely CPU bound and you should inspect it further using Intel® VTune Profiler. In the case of the GPA sample there is no gapping in either tracks so it can be determined that the application is bottlenecking somewhere in the GPU.
Analyzing GPU Bound Applications
Intel® GPA offers multiple tools to assist in analyzing GPU bound applications. First you can open Graphics Monitor and capture a Stream of the GPA sample game, a stream being a continuous range of frames. This stream can then be opened in Graphics Frame Analyzer for multi-frame analysis where you can detect frames with potential bottlenecks based on the frame’s duration. Select the frame which has the highest frame duration—therefore the lowest FPS—from the timeline of the tool then press the Open button to inspect that single frame further in the single frame view of Graphics Frame Analyzer.
Once in single frame view you can set the X and Y axis to GPU Duration by GPU Duration, allowing you to visualize the events that take up the most time on the GPU by viewing the tallest and widest events. To take this a step further you can press the hotspot mode button which groups events by bottleneck or state. In hotspot mode you can identify exactly which bottleneck or state is affecting the GPU the most, by selecting the most expensive group of events then by viewing the 3D Pipeline tab in the right panel of the tool. In this case it looks like there is a primary bottleneck—notated by a red underline—in the Geometry Transformation stage of the 3D pipeline. Selecting the Geometry Transformation stage in the 3D pipeline will open a description of the discovered bottleneck and a link to a documentation page that provides helpful information on how to resolve that bottleneck.
Bottlenecks in the Graphics Pipeline
When the Geometry Transformation stage has a primary bottleneck, this means that the geometry transformation is taking a large amount of execution time and optimization is required. There are several optimizations that are possible to resolve this bottleneck including shader optimizations, reducing the number of off-screen polygons generated from shading, and reducing unnecessary state changes between draws.
You can utilize render state experiments to further isolate bottlenecks and determine how much your application stands to gain by optimizing that stage in the graphics pipeline. Since this bottleneck is in the Geometry Transformation stage, you can apply the One by One Scissor Rect Override experiment, which bypasses pixel processing from the rendering pipeline. You can view the visual difference this experiment made on the selected frame by switching between modes in the visualization mode drop-down list, allowing you to switch from current, original and diff mode. You can also view the performance impacts by inspecting the metrics (right) panel and selecting the Current Selection and Full Frame tabs to view the difference between the original data and the current data after the render state experiment had been applied.
In this case, after applying this experiment on the selected calls, the frames per second (FPS) of the entire frame went up significantly and the full frame and current selection durations went down significantly. At this point it’s safe to assume that this game could greatly benefit from optimizations in the pixel processing stage of the pipeline, including optimization of the pixel shaders.
Graphics Frame Analyzer makes it very easy to analyze your shaders code, simply select your shader from the resource list and your shader code will show up in the resource viewer. The GPA sample game uses HLSL code which you can inspect further in the resource viewer. Select the Pixel option from the drop-down menu above the resource viewer to inspect the pixel shaders code. This code can be modified by typing directly into the shader viewer and the code will recompile during the process. If your code is free of errors you can press the check mark icon to save your changes and then metrics will be recalculated, allowing you to view whether or not your changes aided in the performance of the entire frame.
For enhanced profiling of your shaders you can use the shader profiler feature by clicking on the hotspot button in the shader view, allowing you to locate expensive calls in your shaders code by selecting either Duration or Execution Count from the drop down menu. In addition, you can view GEN ISA code with profiling information by pressing the Show source-assembly mapping button.
This project covered an entire life cycle of using Intel® GPA tools, starting from the beginning using Graphics Trace Analyzer to determine if the application is bottlenecking in the CPU or GPU, then using the multi-frame and single frame view of Graphics Frame Analyzer to locate expensive frames and determine what stages in the graphics pipeline these applications are bottlenecking at, then using render state experiments and the shader profiler feature to determine where optimization efforts are best spent. For more information on Intel® GPA and how you can get started optimizing your own games see the links below!
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.