Find the Root Cause of GPU Idle Time

This video explores frame-rendering performance when a GPU is underused. See how to use Intel® Graphics Performance Analyzers tools to identify the cause and more efficiently increase application performance.

 

Hi and welcome. I'm Pamela Harrison from Intel here to take you on a guided tour of a use case looking at the performance of frame rendering. We will see in this scenario that the GPU is underutilized. From there, we will trace backward in time to seek the root cause. After assessing the root cause of the GPU down time using Intel® GPA tools, the game developer can make informed decisions about what to change.

This use case will guide you through a scenario where we find an interesting pattern occurring in the GPU cycles. Graphics Trace Analyzer can help you pinpoint some areas for potential improvement, helping you to discover how to most efficiently increase the performance of your application.

It makes sense to analyze a single frame when you want to determine how to increase the frames per second rate, the FPS. Graphics Frame Analyzer helps with this by, for example, doing experiments to simplify textures to see if a particular texture is eating the frame rate. However, there are times when doing optimizations based on analysis of individual frames does not give us the FPS boost we were expecting. In these cases, a single frame may not help us to understand the reason for FPS being lower than expected. We might need to see a bigger picture. Getting the bigger picture, using Graphics Trace Analyzer, we can see:

  • Frame patterns, particularly frames that take longer to render than the others
  • Gaps in GPU utilization, which shows GPU downtime
  • And also synchronization flows between the CPU threads and the GPU

We may find particular frames take a long time to process. We may find gaps in the CPU and/or GPU activity.

Let’s take a look. We run our game and check whether the FPS rate is good. If not, or if we think we might be able to do some tuning, we capture a trace so we can start analysis. We might find certain frames take too long, degrading the FPS rate. Or we might find gaps in CPU or GPU usage, leading us to look for how we can better balance the load.

First, we check for bottlenecks, meaning glaring sections of downtime in the GPU or the CPU. So let’s look at the CPU cores and the GPU rendering queue.

Zooming in, we see several gaps in the rendering queue. Let’s click on the first render execution package after a gap. Then we will trace backwards in the call stack to find out what the GPU package is waiting for. Orange arrows show dependencies. Following the orange arrow backwards, we find that our initial render package depends on the render package shown here in the UMD driver queue, where we find the same gap.

So, why was that call delayed?

Let’s look at the preceding package in the UMD driver queue. It’s a signal, which means something in the process is waiting for the GPU to signal that it finished something. Green arrows show synchronization flow. So we follow the green arrow to see what the GPU is signaling. This takes us to a WaitForSingleObject call in the main CPU thread.

What does that tell us? Well, the CPU told the GPU that it would be waiting to hear back when the GPU finished processing something. So did the CPU just hang out waiting? Hmm. Sometimes a job has to wait for some other job to finish before it can continue. But typically, the CPU sends data and instructions to the GPU, and the CPU then goes on with more processing while the GPU is working. But maybe there is a reason for the Wait. Let’s take a deeper look.

Zooming out a little:

  • We see that the frames are rendered in triplets: frame1, frame2, frame3, then the GPU waits, then another frame1, frame2, and frame3, and so on.
  • We also see that the CPU thread—the one that is the main rendering thread [and] the one that is responsible for scheduling and execution of all graphics work—is inactive. You can see that it is inactive while it is waiting by the absence of the green activity line over the thread track. The activity line is green at the beginning of the wait when it is being initialized, then the thread does nothing until it hears back from the GPU signal where it is again green.
  • This means that, for some reason, the CPU and GPU are synchronized.

So let’s review the pattern. The CPU does all preprocessing for a triplet of frames (frame 1, frame 2, frame 3,) [and] then waits until the GPU renders those three frames before it starts processing the next three frames. Because the CPU was waiting, it is not ready with more work for the GPU. See how the frames actually take the same amount of time to process in the CPU, but if you only look at start and end time for each frame, it looks like frame number one takes 2.5x longer to process than frames 2 and 3 (5:18) because, for frame 1, the wait time is aggregated along with the processing time. So this is a case where looking at a single frame could be misleading because it doesn’t tell the bigger picture.

Now is the time to go to the architect and developers to find out if there is a reason for this. If not, is it worth changing? Will it make the game play smoother? Has the game been tested on multiple platforms? The Intel® GPA tools help you see what is going on so that you can make smart decisions.

Here we saw at first glance, with the gaps in the GPU, [that] it appeared that the game was CPU bound, that the CPU could not keep up with the GPU. But, as we saw on closer examination, the CPU was idle as it waited for a signal from the GPU; the main rendering thread was actually locked until the last frame in the series was presented. With Graphics Trace Analyzer’s visualization of timings, dependencies, and synchronization, it more easily becomes clear what is happening and what is affecting what.

Graphics Trace Analyzer helps you understand the details behind the performance of your application, as well as finding anomalies and issues. Also, keep in mind that even if FPS is fantastic, knowing that the GPU is underutilized allows you allocate that reserve—the GPU downtime—to incorporate more cool graphics in a game. From that point of view, this kind of analysis is always good to do as a part of your development plan.

If you’re interested in learning more about how to identify and analyze problem areas in your applications to distribute workloads evenly across CPUs and GPUs, watch our other Graphics Trace Analyzer videos. Thank you for watching, make sure to like this video and subscribe to our YouTube* channel for new exciting [Intel] GPA updates and get started optimizing your games today.