Contents
- Editor Optimizations
- Compressing Textures
- Mobile Stock Shaders
- Static Batching
- Dynamic Batching
- HDR – High Dynamic Range
- Selecting the Optimal Render Path
- Beware Extra Forward Draw Calls
- Android: Splitting Binary
- Conclusion
- Resources
- About the Authors
Go Back to Part 3 of the Tutorial:
Unity* Optimization Guide for Intel x86 Platforms: Part 3
Editor Optimizations
Compressing Textures
Unnecessarily high resolution textures can easily become a bottleneck in mobile games and slower hardware. It is always worth verifying that the textures you use in your scene are in compressed format and that you are selecting the Generate Mip Maps checkbox to enable mip mapping. Mip mapping is similar to the LOD system discussed earlier, but for texture resolutions. If any object you are drawing is super far from the camera, it’s not necessary to use a 1024 x 1024 texture to get the detail, as the object may only be covered by a few pixels.
Figure 44. Compressing textures and generating mips on the Inspector tab for selected texture
You can verify that your textures are being compressed and mips are being generated by taking a frame capture in GPA and looking at the Texture tab for the draw calls you wish to investigate. It should be mentioned that generating mip maps can actually result in poorer performance in some cases due to generating additional data. As always, verify this option in your app.
Figure 45. Viewing the 4th MIP level and format of the texture in the Frame Analyzer
Mobile Stock Shaders
When using stock shaders in your mobile app, it is usually worth switching to the mobile version counterparts provided by Unity to ensure that lower precision floating point values and mobile specific optimizations are being utilized. Try them out in your app by selecting your materials and finding the Mobile section in the drop down menu (Figure 41).
Figure 46. Stock shaders in Unity
Static Batching
As of Unity 5.3, the batching system has been overhauled to build the geometry into one buffer on multiple threads, bind the buffer + state, and then issue multiple draw calls for each range in the buffer. Before, Unity would combine all geometry data and issue a single draw call. The new version of batching will not cut down the amount of draw calls, but it will save on the amount of state changes that are needed to accomplish the draws to save time. You can turn on static batching by selecting File->Build Settings>Player Settings and then checking the ‘Static Batching’ option listed. Unity will now enable these settings by default.
Figure 47 Static and Dynamic Batching checkboxes under Rendering in Other Settings
Check the static checkboxes for all game objects that will not move. This will allow static batching to be used on multiple similar objects that share a material.
Figure 48. Batching checkmarks based on object
- Make sure you use Renderer.sharedMaterial over Renderer.material to keep material shared.
- When possible, combine multiple textures used by the same material (Diffuse, Bumped Specular, etc.) into a texture atlas to increase the number of batched objects.
Most of the time, static batching will provide a tremendous benefit to your application, but there can be situations where it is best not to use it. If you need to lower your memory usage, it can be disadvantageous to use static batching. Even objects that share geometry data will have to pack a copy of each instances vertex / index data into the draw call’s vertex and index buffers. Unity gives an example of when static batching can go wrong with a dense forest scene. In this extreme situation, each tree with the same material is packed into these buffers before issuing the call and that can cause performance issues.
Dynamic Batching
Dynamic batching is the same concept as static batching but instead batches draw calls for dynamic objects (objects that move).
- Dynamic batching is only applied to objects that have less than a total of 900 vertex attributes (subject to change).
- Objects receiving real-time shadows will not be batched.
HDR – High Dynamic Range
If you have a scene with HDR effects like Bloom and Flare using deferred rendering, you will see a large decrease in draw calls by checking the HDR box in the Camera Settings. It’s important to note that each camera has its own HDR checkbox. It’s best to use HDR when using DirectX 10 or better with deferred rendering. HDR is most closely related to the fragment shader. If you wish to use HDR effects, follow these guidelines:
- Use Deferred Rendering
- Check the HDR box on each camera with effects requiring HDR (Figure 44)
Figure 49. HDR option in Camera settings
Selecting the Optimal Render Path
Selecting the optimal render path for your app is highly dependent on what you are trying to do. The following is a brief overview of each of the render paths that Unity provides with their pros and cons. Hopefully with this information, you will know which path to choose based on your project specifics, but as with the rest of these optimizations, testing each option is always the way to go! A great way to see which rendering path is optimal for your game is to write a switch in code that will change rendering paths on the push of a button, and then observing the effect of each path in real time via the Unity Profiler and GPA.
Vertex-Lit Rendering
Figure 50. Final render target for scene rendered with Vertex-Lit path
- Pros
- Only does lighting calculations on a per vertex basis, not per-pixel
- For a model with 1024 vertices that covers the entire full screen window on a 4k monitor, lighting calculations will be done:
- 1024 times in the per-vertex lighting method
- 8294400 times in the per-pixel lighting method
- For a model with 1024 vertices that covers the entire full screen window on a 4k monitor, lighting calculations will be done:
- Can drastically improve mobile performance
- Simple to analyze. Everything done in the base pass.
- Only does lighting calculations on a per vertex basis, not per-pixel
- Cons
- Real-time shadows and other per-pixel effects not supported
- Low quality lighting (Figure ..)
Figure 51. Vertex-lit rendering path breakdown (all in the base pass)
Forward Rendering
Figure 52. Final render target for scene rendered with Forward path
- Pros
- Lighting done with a combination of per-pixel, per-vertex, and spherical harmonic techniques
- Real-time shadows and other per-pixel effects supported
- Does not incur the memory cost required to build the g-buffer in the deferred path
- Cons
- Can lead to many draw calls covering the same pixel(s) if care isn’t taken
- Pass Breakdown
- Base pass
- First per-pixel light reserved for brightest directional light.
- Next, up to 3 other per-pixel lights that are marked as important are drawn. If no lights are marked as important, the next 3 brightest from the scene are chosen. If there are more lights marked as important that exceed the “per-pixel light count” setting value in Project->Quality, then these are done in additional passes.
- Next, up to 4 lights are rendered per-vertex.
- Finally, remaining lights are drawn using spherical harmonic calculations (these values are always calculated, so essentially free on GPU).
- Per-pixel Lighting Pass
- An additional pass done for each per-pixel light remaining after the base pass.
- Semi-transparent Object Pass
- An additional pass done for semi-transparent objects.
- Base pass
Figure 53. Final render target for scene rendered with Forward path
Deferred Shading
Figure 54. Final render target for scene rendered with deferred path
- Pros
- Lighting performance is unrelated to scene complexity
- Trade heavy lighting computation (FLOPS) for more memory usage leading to a higher chance of being memory bound
- Real-time shadows and per-pixel effects supported
- Cons
- Semi-transparent rendering not supported directly. These objects are drawn with an additional forward pass.
- Higher memory usage due to building of g-buffer
- No support for anti-aliasing
- No support for the Mesh Renderer’s Receive Shadows flag
- Pass Breakdown
- G-buffer pass
- Renders all opaque objects to build the g-buffer. The layout is as follows:
- Render Target 0: ARGB32 – Diffuse color in RGB channels and occlusion data in alpha channel
- Render Target 1: ARGB32 – Specular color in RGB channels and roughness in the alpha channel
- Render Target 2: ARGB2101010 – World space normal in RGB, and the alpha channel is unused
- Render Target 3: ARGB32 (non-HDR) or ARGBHalf (HDR) – Emission, lighting, lightmaps, and reflection probes buffer
- Depth and stencil buffer
- Renders all opaque objects to build the g-buffer. The layout is as follows:
- Lighting pass
- Uses the textures generated by the g-buffer pass to do per-pixel lighting calculations. Unity will pass in geometric bounding volumes to Z-test against, making it easy to detect occluded / partially occluded lights.
- Produces texture with RGB channels holding diffuse lighting values and the alpha channel containing the monochrome specular color.
- Light application pass
- The final pass draws all objects again, using the texture generated by the lighting pass to apply lighting to each object.
- Semi-transparent objects pass
- Require additional forward pass.
- G-buffer pass
Figure 55. Final render target for scene rendered with deferred path
*More information about the rendering paths available can be found at http://docs.unity3d.com/Manual/Rendering-Tech.html
Beware Extra Forward Draw Calls
As previously mentioned, the forward rendering path will issue additional draw calls for each light affecting your geometry up to the “Per-Pixel Light Count” setting shown under the quality settings section of the editor. Lights marked as important will be used in these calls first, but if there are none marked then Unity will choose the next brightest lights. In some cases this can lead to a lot of unnecessary overhead, especially when baking your lights is an option. The following screenshots of a GPA capture show the base draw as well as the 3 additional colors. If there is a need to avoid the following situation, it can be beneficial to bake lights or to lower the per-pixel light count in quality settings.
Figure 56. GPA Capture showing the 4 draw calls required to paint this floor taking up 55.3% of the scene GPU duration.
Figure 57. Viewing color buffers in GPA for additional forward passes required for green, red, and blue lights.
Android: Splitting Binary
The latest Unity releases have the ability to produce a Fat Binary or split the binary into separate ARM and x86 portions. You can use the same process to choose either x86 or ARM to test various aspects of a deployment. Evaluating compression, code, and other specifics allows you to troubleshoot or even benchmark your builds.
Building FAT APKs does not significantly increase the binary size. You can build slim binaries by simply choosing x86 or ARMv7; however, it would be necessary to maintain two separate builds.
In Player settings (File>Build Settings > Player Settings):
- Open/expand other settings
- In Configuration, find Device Filter and choose: FAT (ARMv7+X86). See Figure 51.
Figure 58. Configuration in Other Settings. Three options shown in Device Filter. - Choose build (on Build Settings screen) to begin the process of creating the selected binary.
That’s it! You are now supporting x86 in your Unity Android game deployments.
A special Unity x86 developer page is available at www.intel.com/software/unity for additional support.
Conclusion
Optimization is a job in and of itself for achieving high levels of performance from your graphic intensive game. Combinations of the above techniques can help you gain significant ground. Using these tools will allow you to dive deeper and make further adjustments.
Resources
Reducing textures and sizes: http://docs.unity3d.com/Manual/ReducingFilesize.html
Shadows: http://docs.unity3d.com/Manual/Shadows.html
About the Authors
Cristiano Ferreira is a software engineer working in Intel’s Developer Relations Division with a specialization in games and graphics. Cristiano helps game developers give their customers the best experience possible on Intel hardware.
Steve Hughes is a Senior Application Engineer with Intel, focusing on support for game development on x86 platforms from desktops and tablets to phones. Prior to Intel, Steve has 12 years of experience as a game developer, working on all aspects of game development for a number of companies.