# VR Optimization Tips from Underminer Studios

Published: 04/05/2017

Last Updated: 04/05/2017

## Introduction

This article describes ways to get every bit of performance, visual quality, and design functionality out of a virtual reality (VR) project. While this article focuses on specific VR issues, most of the origins are core optimization areas such as poly counts, common performance mistakes, and knowing the more efficient but relative quality solution to apply. I’ll also share tips, tricks, and expert advice that you won’t find in a textbook because we are writing the paradigms for VR now.

At its core, optimization is simple when done early and often. Imagine balancing a four-sided seesaw on a train, with pins on each seat and a bowling ball on each pin. Each bowling ball represents the load of the project: Package size, CPU, GPU, and RAM. The train is the rest of the project moving forward. The take-away is that the process is a balancing act, and you must understand all the moving parts in order to have a broad overview of the project.

It’s important to know the story behind each object we optimize. Spending time with the designers, coders, and artists and discovering their methodologies will help you understand the best way to attack the performance optimization issues of each asset. If you understand how something has been assembled, you can disassemble it and know how to make all the parts work together more efficiently. The care you take during optimization will help the unique aspects of each game shine through.

Before

After

Start Here

First, we cover what causes slowdown in relation to VR and what you can do about it. For more information about optimization in general, how to identify bottlenecks, and best practices, please look at these articles here.

https://software.intel.com/content/www/us/en/develop/articles/unity-optimization-guide-for-x86-android-part-1.html

https://software.intel.com/content/www/us/en/develop/articles/performance-analysis-and-optimization-for-pc-based-vr-applications-from-the-cpu-s.html

## Why is VR so expensive computationally?

Currently in VR there are three major issues that affect performance: the VR experience itself, refresh rate, and multiscreen rendering. Some of these issues cause more pain than others. So, let’s dig in and see how this all works.

The VR Experience
Every VR system does some form of tracking. The Vive*, Rift*, and PSVR* have tracking stations that calculate at a high rate of speed the player’s location/rotation and any accessories in 3D space, as well as the prediction information that is calculated every 1/90 a second or less. For example, if a person is walking forward, the system expects that the person will continue to walk forward. The computer must calculate this, which isn’t free.

Another issue is the tracking speed. Each system has a different speed of tracking, and some are not as scalable as others. Any missing data between tracking speed and frame rate goes back to prediction information. The Vive base stations update every 4 ms, while the Rift is locked to frame rate, but either way at a max of 1/90 of a second an object traveling faster than a certain speed will be perceived to lag if it’s not prediction tracked.

Warping of the images and compositing things like chaperones in the case of Vive is also an issue. In some systems, there have been reports of entire cores being taken up by the composite from OSVR.

What can developers do?
Right now, not much. A lot of these issues are dependent on the VR hardware and won’t be resolved until updated hardware revisions become available. The long-term goal would be to create headsets that handle the entire experience on their own.

Refresh rate
Now we’ve moved up to “mere flesh wounds” in terms of issue importance. Most AAA games run somewhere between 30 and 60 FPS. VR works at 90 FPS—a third faster. Not too bad, right? Unfortunately this issue is much more complicated. Sure, most systems utilize a reprojection feature to assist with the occasional FPS fail, but after a while every system will just default to the next frame rate tier down. If you don’t provide processing head room (especially in the case of the Vive), the application will get bumped down to the next frame rate tier entirely. When you are running at below 90 FPS for long enough you will get knocked down to 45 FPS until you receive enough good FPS results and get bumped up again. Often this means you ideally want to run 100–110 FPS to make sure that there is enough headroom in the event of a 45-headed bad guy on screen and there isn’t a massive 45 FPS drop to match. Gulp, so then another third on top of that, right?

There is another factor on top of this that isn’t often considered— driver timing. Developers know a game is the “shampoo” of the tech world. The CPU sends things to the GPU, which shows the things on the screen. Lather, rinse, and repeat X number of times a second and boom you have a game. Those little hand-offs add up though and eventually their cost limits the entire experience. The link below shows a point in the optimization when the drivers cost a staggering 38.5 percent of the experience.

https://software.intel.com/content/www/us/en/develop/articles/performance-analysis-and-optimization-for-pc-based-vr-applications-from-the-cpu-s.html

What can developers do?
As far as frame rate, developers need to optimize for the platform. Optimizing games, which we will cover later, requires having a healthy performance overhead to make successful experiences. With drivers, there are some interesting possibilities, with DX12*, Mantle*, and Vulcan* on the forefront combating this issue. Hopefully in the future there will be updates to their features and new driver and hardware chipsets. If you want to read more about their optimization practices, read here.

https://www.pcper.com/reviews/Editorial/What-Exactly-Draw-Call-and-What-Can-It-Do

Multiscreen Rendering
Here comes the pain. Inside of every VR headset there are multiple screens, one for each eye. They have a slight visual offset to create the illusion of depth and convince the user that they are indeed on Mount Everest, not in their London flat sitting in their undies. Currently we are sending two separate cameras from the world—one to each eye—to create a visual presence. This, along with additional effects like normal maps and sprite-based particle systems, also doubles the number of objects that need to be drawn. Of course, that doesn’t mean the overhead is only 50 percent. This also means two times the draw calls rendering the cameras to a texture so it can be shown, which is a huge task. Instead of rendering that texture once, at 90+ FPS with twice the draw calls while tracking and managing tracking overhead, it is rendered twice. The real kicker is that each eye is about the average size of a 1080p HD screen, 2160x1200 (1080x1200 per eye). Anti-aliasing (AA) is also taxing the system. Since VR games are strapped to your head they need higher AA quality than do normal games.

What can developers do?
There are some current features already being implemented and some on the way. One feature is performance-based screen size combined with AA quality, which should be implemented on every game. The concept is simple enough: when a game is lagging, the screen size is reduced and the AA is lowered. A second option is to flush the GPU multiple times a second, which ensures that the GPU has enough work to toil through. Instanced stereo rendering, which takes the objects in the scene and prepares all the information on the CPU for one eye, and then packages this information and offsets the transformation information to compensate for the other eye and then renders. You can read about this technique here:

Foveated rendering takes the important parts of where a person is looking and makes those larger, while making the less significant peripherals images smaller and less clear. This extra detail could allow for a further reduction of the entire size of the headset output to each screen. Radial density masking renders every other pixel on the screen and produces a checkerboard image where the neighboring pixels provide information for the missing pixels. You can hear more about these topics from Alex Vlachos, a graphics engineer at Valve:

The best project example available is Valve’s Lab Render for Unity* where you can see most of these techniques implemented.

## Future Solutions

More efficient software and hardware
Most developers are working to upgrade the ability of game engines to take advantage of multicore CPUs. Multiple threads have been popular for a while in some software applications, but this is newer technology for most game platforms. The current generation of GPUs has increased VR performance on the PC side, but more optimization is possible from the manufacturers of head mounted displays (HMDs). The currently available headsets are mostly dummy systems importing the images that are provided. The newer generations of headsets could have their own processing capability to handle location information, stereo rendering, and reprojection in the case of low frame rate.

Better cooling systems for the mobile market
Currently with mobile platforms, including laptops and small form factor computers, the major bottleneck is CPU. These devices are underpowered compared to their desktop or even their laptop counterparts and are also throttled when used to their full potential. Cell phone parts to date haven’t been created to run at full steam for any length of time and are optimized for call time and longest battery life with some gameplay involved. This creates abysmal draw call numbers and vert count.

This concept renders every other pixel and uses the pixels near it to fill in the information that is missing. This approach provides a marginal, but not insignificant, increase in performance of up to 10 percent in some cases. If it is used in conjunction with foveated rendering, as seen below, it can provide a performance increase without a perceptual loss of quality.

http://arstechnica.com/gaming/2016/03/how-valve-got-passable-vr-running-on-a-four-year-old-graphics-card/

Foveated rendering
With foveated rendering, when a user isn’t looking at something it is not rendered at full detail. The current version is in the form of fixed foveated rendering. The center one-third is scaled up and the outside two-thirds are scaled down. The more advanced techniques, which are soon to be released, are perceptually based foveated rendering and gaze-based foveated rendering. Perceptually based foveated rendering uses the concept of driving object detail based on the importance in the center of the scene. Gaze-based foveated rendering will use eye tracking to drive the process of rendering the zone of importance. Together these could include a myriad of other techniques, including depth of field based on the view and vision-based level of detail (LOD). Hardware developers pushing the computations on to devices like in our HMD manufacturer example would be an ideal use of this feature in order to not incur the overhead cost of tracking.

Unfortunately, none of this technology will ever overcome the need for developers to work creatively within the limitations of hardware, software, and the design constraints of the platform. Many of these solutions when used alone will create bottlenecks in other locations of the system. For example, instanced stereo rendering can increase GPU workload up to 257 percent.

Other solutions can shift performance issues to areas that might already be bottlenecked. Knowing what the project needs is important. For instance, optimizing GPU when you’re CPU bound is not only a waste of time but also not in the best interest of the application.

How can developers optimize their projects?
Obviously, our current technology is limited, but most of the issues that come up in VR are like those in video games with a slight skew in perspective—we can use known tools to overcome issues. Those familiar with mobile development can use known optimization techniques like atlasing, massive shader optimization, and baking of most if not all textures. Even if you are a seasoned developer your project is most likely GPU bound; that is, unless you are doing mobile or using a laptop or small form factor computer for VR, thermal throttling is a huge issue for performance. Currently the way that GPU design is implemented, unless you have an extremely fast CPU, like a 7th generation Intel® Core™ i7 processor or better, your GPU will always be your bottleneck. The increase in power using a higher-end CPU allows you to maximize performance to take away extra work from the overtaxed GPU. This is a current and recurring issue about game development: one GPU, one screen, deferred rendering, screen space effects. The GPU is mostly preoccupied with rendering multiple eyes so figuring out how to optimize for the GPU becomes much more important. Oculus does not recommend full screen effects because of their performance implications.

It is important to evaluate your project for CPU or GPU bounding. To assess your project’s status, you can use Intel® Graphics Performance Analyzers (GPA), which provides performance analysis in a single package. For more information on the tool, including a video by Intel’s Seth Schneider, see the following links.

## Poly Count

The problem
Poly count used to be a non-issue for games; systems could push millions upon millions of triangles and not break a sweat. Current-generation PCs can easily do multi-million triangles, but it is difficult to get even 3 million on a VR project, even if you are very skilled. Poly count optimization comes in two flavors:  LOD version and raw version.

How to identify
Usually a project isn’t poly limited on its own. Poly reduction is useful because of things like fill rate, real-time lighting, and shader complexity. These things work against the number of polygons you can have on screen. For fill rate, look at the amount of on-screen overdraw. If you have a ton of overdraw,  optimization of the polygons can help. If your project has many real-time lights, each of those lights renders the polygons once for forward rendering, the recommended render path for VR. Shader complexity can add significantly to poly count so if your project has anything nice when it comes to shaders in VR, the recommendation is to use mobile shaders for VR. You will need to get creative in making things look good. Reducing the polygon count will allow you get away with just a bit more quality when it comes to shaders.

What to do
If your team has the people power to do this by hand, then by all means do it by hand. If you have enough funding, use something like Simplygon* or Decimator* from Mixamo. Here is an alternate technique that can batch through a lot of assets quickly with decent quality results based on your use case. Please remember to use a style of version control because blanket optimization will not work for all assets and you will want to go back and optimize more and remove optimization from some hero objects.

import pymel.core as pm

#lets get all the objects by the type of geometry
pm.select(pm.listRelatives(pm.ls(geometry=True), p=True, path=True),r=True)
objectsToReduce = pm.ls(sl=True)

for objectToReduce in objectsToReduce:
pm.select(objectToReduce)
pm.polyReduce(percentage=35, version=1)


## LOD

The problem
LOD presents a couple of issues: a limited vert count, as mentioned above, and having to limit the number of draw calls. Using LODs we can push the limits of scenes by creating lower-level versions of objects that live in the background. For example, in the front breathing down our necks is the 45-headed monster from before and in the background is a tea kettle. Now this is a hero teakettle, so maybe it’s 2k tri close-up with six draw calls: color, normal map, height map, spec map, and baked ambient occlusion. The 45-headed monster is in front and the teakettle is across the room. Where should you limit your detail? On the monster, correct? LOD helps you with this.

Image property of Unity.

How to identify
Obviously LOD doesn’t help every scene and object. Sometimes using LOD groups will hinder performance. To find appropriate uses, look for dense objects and objects with many draw calls, such as a large room or open room areas with many objects. A scene that has only a few objects makes the LOD change more noticeable, because people tend to focus on individual objects. In these types of cases, it’s a waste of time to do a few draw calls and then optimize with LOD. If a scene is small and tight don’t think about optimizing with LODs either; micro-movements will cause changes to LOD. Each of these changes costs and they all add up, so be smart about the use of LOD.

What to do
LOD work that is well done involves three things: optimized models, optimized materials, and optimized shaders. As an example, let’s use a close-up with lots of detail like our high-resolution teakettle. Using a detail map can mean the difference between realism and garbage up-close. A nice shader like a PBR will really help sell that the main character’s grandmother loved this teakettle until her dying day. Once it is further away, take out the normal and height maps, and then lower the vert count by 50 percent or so. Change shaders to a mobile and lower the vert count once again when it’s across the room.

import pymel.core as pm

#remember to select an object
reductionPrecentages = [0,10,20]
nameOfLODS = []
selectedObject = pm.ls(sl)

for x in range(0,ReductionPrecentages.length):
newName = (selectedObject + "_LOD[%s]")%(x)
pm.duplicate(newName)
pm.parent(selectedObject)
pm.polyReduce(percentage=reductionPrecentages[x], version=1)


Here is an update to the script that works well for Unity and even makes the LOD groups for you. We changed the script to a for loop, made the optimization number a variable, and added that same variable to the suffix name with _LOD[variable]. Loop through the list of sizes you need and export via FBX. Import into Unity and, viola, you have your LOD groups.

## Object Count

The problem
Object count is a significant issue for good optimization because it doesn’t just include the number of objects being reduced. Looking at atlasing the textures also to optimize for draw calls is important.

Image property of Unity.

How to identify
Let’s again take the teakettle example and add the whole tea set, which creates a lot of different objects to render, each with separate materials and textures. Optimizing these involves different versions of atlasing: combining without atlasing, combining with atlasing, and combining with LOD atlasing on lower levels.

The combining without atlasing technique means almost anything can be combined, which provides several performance optimizations. This is helpful if you want to lower the overdraw, because it will render everything as one object, making the entire tea set one mesh.

The combining with atlasing technique is effective for VR games with assets that can be combined, since their textures are atlased and this costs almost nothing. With this technique, the objects will be combined when there is little close-up detail of the objects. Leaving their textures atlased will provide a nice increase in performance. Thus the entire tea set is the same object with the same material throughout. A major upside is that it takes only six draw calls, while a downside is that the details on the teakettle or tea set are lost, since most likely each material would have different detail maps.

The combining with LOD atlasing on lower levels technique combines the objects together, but on lower LOD levels so we keep the materials separated, with an increased cost. This method does allow for many of the niceties to continue to show through on our assets as well as allow for a quick asset switch between a combined version and hero version of the teakettle. This technique is useful if the teakettle is going to be picked up.

What to do
For the combine method, simply script this together in Unity with the following from the Unity documentation for Mesh.CombineMeshes.

using UnityEngine;
using System.Collections;

[RequireComponent(typeof(MeshFilter))]
[RequireComponent(typeof(MeshRenderer))]
public class ExampleClass : MonoBehaviour {
void Start() {
MeshFilter[] meshFilters = GetComponentsInChildren<MeshFilter>();
CombineInstance[] combine = new CombineInstance[meshFilters.Length];
int i = 0;
while (i < meshFilters.Length) {
combine[i].mesh = meshFilters[i].sharedMesh;
combine[i].transform = meshFilters[i].transform.localToWorldMatrix;
meshFilters[i].gameObject.active = false;
i++;
}
transform.GetComponent<MeshFilter>().mesh = new Mesh();
transform.GetComponent<MeshFilter>().mesh.CombineMeshes(combine);
transform.gameObject.active = true;
}
}


For the full-time combine method with atlasing, I recommend Mesh Baker* because it will save your team hours of work. To do LOD with full-time combine and atlasing, consider Mesh Baker LOD. Follow these links for more information.

For the most advanced technique with the highest level of control, we must get more creative. The idea is to export each LOD of each separate model for each object. You’ll need to combine the highest meshes together with Mesh Baker without combining their textures. Then for the lower levels let Mesh Baker optimize the middle resolution after you remove some of the textures for optimization. At the lowest resolution change all the shaders to the lowest shader acceptable and combine. This more advanced technique is more time consuming but can save a large amount of computations.

## Draw Calls

The problem
Draw calls, or more importantly set passes, are the cost of getting something from one side of the pipe to another. A great way to remember set passes is that a set is being passed from the CPU through a driver to the GPU. The fewer the number of and less complex the set passes the better. The easiest way to optimize for set passes is to optimize the number of different materials, atlas textures, and reduce the use of secondary textures like height maps, normal maps, and specularity. We already briefly mentioned optimization techniques in relation to set passes, but here I will go into more detail.

How to identify
Look for like objects that share the same shader, which is ideal for optimization. In our example, the whole tea set should use the same shader. Next look at secondary maps and identify the ones with lots of extra detail like the teakettle. As a hero object, we can’t always optimize it, but it’s good to keep an eye on it just in case it is breaking the bank. Finally, if objects use smaller textures, combining them together into one large texture can make a huge difference. Using our example, think about each plate having its own texture. Combining the textures into one atlas will lower draw calls.

What to do
Objects that share close to the same shader with little difference visually can benefit by changing their shaders to match so they can be combined. The use of minute detail at distances that are unusable is just a waste of resources. When an asset isn’t checked in a headset, the details can be excessive for a VR experience and often become muddy or unviewable.

It’s important to spend time viewing the game while wearing the headset. Remember that the assets you identified earlier may be possible hogs. If an asset isn’t noticeable, just axe it; however if it is noticeable, try to combine the textures. Although this can consume more drive space and RAM, often the changeover is worth it even if you don’t fill an entire atlas. Atlasing the textures into one material without combining the meshes provides flexibility in detail.

Another option is to make a custom LOD system that is a super hybrid with objects that can move independently and then distance flip to one that has them combined once you reach a certain distance and then is taken over by the LOD system on its own. Note that your mileage may vary, and it might be less performant than just leaving them separate, but this trick might help if you are fill bound and have plenty of CPU to spare.

One last well-known technique that people often forget about is texture combining. This is NOT atlasing like before but instead literally combining the textures of greyscale images to make a single-color image. For instance, the teakettle has an AO map, spec, and height map. If you put the AO in the red channel, green channel gets the spec, and height map in the blue channel, you can combine those in one draw call. Now you must make sure your shaders match, but that’s what low-level optimization is all about.

1. Select the layer.
2. Go to the channels.
3. Remove the green and blue channels to make a purely red channel.
4. Go back to the layers dialog and repeat, but this time make green.
5. Do the same again, but this time make blue.
6. Enjoy!

Image property of the Gimp team.

## Bake Lighting and Ambient Occlusion Baking

The problem
As we’ve seen, anything that needs to be calculated in VR should be precalculated. The CPU is spending almost an entire core just keeping the experience running, and the GPU is spending its free time speed-dating two games at once. This means bake everything you can. Real-time global illumination is almost impossible to pull off and real-time lighting can quickly get you in trouble.

How to identify
Most everything needs lightmaps. Anything that doesn’t should receive real-time shadows, which don’t necessarily need to be shadows. See below for more information. Look at Valve’s Lab Render to get an idea of how to visualize the use of lightmaps in the scene.

What to do
We have a couple of things to cover, first we are looking at baking almost all the lighting. Sadly, in the current version of Unity lightmap, baking is a painful if not fruitless process with larger scenes. I recommend exporting all the assets to a digital content creation program, like Maya*, and then baking your own lightmaps and ambient occlusion. Making good lightmaps is outside the scope of this article. Here is the knowledgebase for Turtle* in Maya, my personal favorite, and below that the knowledgebase for ambient occlusion through Substance*.

Image Property of Autodesk.

https://knowledge.autodesk.com/support/maya/learn-explore/caas/CloudHelp/cloudhelp/ENU/123112/files/turtle-html.html

Image property of Allegorithmic.

https://support.allegorithmic.com/documentation/display/SPDOC/Baking

The downside is that you will also have to make shaders that support this but a quick cannibalization of the Valve shaders should get you close, and their optimization on those shaders is stellar.

Real-time lighting shadows don’t have to be shadows at all. Simply use the light direction to effect faked shadows on the ground, which are nothing more than an atlased animated texture matched up with the character animation or an even easier technique the old-school blob shadows with angle ability. Surprisingly, few people will notice it.

The problem
Shaders are the root of all evil when it comes to optimization. Very few people can write shaders and most of them that come with engines have little optimization for VR applications. The best bet is to use the mobile versions of shaders or get creative with shader creation.

How to identify
If a shader has more than four slots and the object you are building isn’t a hero object, the shader probably isn’t optimized for your application. All that is really needed for non-hero pieces in VR is Color, Normal, Occlusion, and Lightmap.

What to do
You can try to optimize using the mobile version of the shaders, but the best option is to learn shader writing. It’s not that hard to learn and your production will thank you.

## Occlusion Culling

The problem
You might not know that culling is already on by default in your game in Unity. Frustum culling will cull out any objects that aren’t in the view of the camera’s frustum. The next step is to use occlusion culling. Back to our tea set example, if you have a teakettle and it is sitting in front of silverware, when occlusion culling is on the silverware is not drawn at all, which can be helpful.

How to identify
This technique is great if you are in an enclosed area. Houses or buildings are prefect for this tool, but performance will be hindered if you are in an open area with overlapping objects.

What to do
Occlusion culling can be both a boon and a curse as far as overhead. My opinion is that it’s better not to have to worry about culling at all with extraneous things that the camera will see. For example, imagine there is a house and you are in the garage looking toward the kitchen. On the table, there is the tea set, with a wall between you and the tea set. Typically, in games the tea set would be rendered then thrown away because of the wall. If you use occlusion culling there is a precalculated grid that knows all the objects within its bounds. The camera sends out rays to test what it will see when it hits a boundary; it won’t render objects out of view. It is recommended to just try it with the standard settings and see if it helps. If not, the standard setup should be quick enough, and if it isn’t, set it up to render before you leave for the day. If you get some improvement, start measuring things and figure out the exact size of everything.

One Final New Technique
Hybrid mono rendering is a new technique you can add right now. Basically, the idea is that you render everything up close in stereo, and then everything further away only gets rendered once.

https://developer.oculus.com/blog/hybrid-mono-rendering-in-ue4-and-unity/

## Conclusion

I hope this article gave you a lot of new ideas for working on newest platform, VR.

Further optimization for VR in Unity
https://unity3d.com/learn/tutorials/topics/virtual-reality/optimisation-vr-unity

Further optimization for VR in Unreal*
https://developer.oculus.com/blog/introducing-the-oculus-unreal-renderer/

Many thanks to the Intel Innovator Program for supporting this article so that we can work toward solidifying best practices with the help of their influence.