VR Content Developer Guide

ID 672065
Updated 11/9/2017
Version Latest
Public

author-image

By

Mike Apodaca, Sumit Bhatia, Matthew Goyder, John Gierach,  Rama Harihara, Anupreet Kalra,  Yaz Khabiri,  Pavan Lanka, Katen Shah, Bill Sadler, Prasoon Surti, and Travis Schuler  contributed to the developer guidelines provided in this document.

This document provides guidelines for developing and designing a virtual reality (VR) application and obtaining optimal performance. This guide is based on performance characterizations across several VR workloads and describes common bottlenecks and issues. It provides solutions to address black-and-white-bound choice-of-texture formats and fusing shader passes, and instructions on how to use post-anti-aliasing techniques to improve the performance of VR application workloads.

Goals

  • Define general design point and budget recommendations for creating VR content using 7th generation Intel® Core™ i7 processors and Intel® HD Graphics 620 & 615 (GT2)
  • Provide guidance and considerations for obtaining optimal graphics performance on 7th generation Intel Core i7 processors
  • Provide suggestions for optimal media performance, particularly for 3D media
  • Provide tips on how to design VR apps for sustained power, especially for mobile devices.
  • Identify tools that can help you identify VR issues in computer graphics on VR-ready hardware

Recommended Design Points for Developers

General Guidelines on Design Points and Budgets to ISVs

Category Guideline
Triangles / Frame 200‒300 K visible triangles in a given frame.1
Use aggressive culling of view frustum, back face, and occlusion to reduce the number of actual triangles sent to the GPU.
Draws / Frame 500‒10001
Reduce the number of draw calls to improve performance and power. Batch draws by shader and draw front-to-back with 3D workloads (refer to 3D guide section).
Target Refresh At least 60 frames per second (fps), 90 fps for best experience.
Resolution Resolution of head-mounted display (HMD) can downscale if needed to hit 60 fps but cannot go below 80 percent of HMD resolution.1
Dynamic scaling of render target resolution can also be considered to meet frame rate requirements.1
Texture Budget TBD1
Memory 180‒200 MB per frame (DDR3, 1600 MHz) for 90 fps.1

1This data is a work in progress and is to be used as a placeholder.

Considerations for Optimal Performance on General Hardware

Texture Formats and Filtering Modes

  • Texture formats and filtering modes can have a significant impact on bandwidth.
  • Generally 32-bit and 64-bit image formats are recommended for most filtering modes (bilinear and so on).
  • Filtering trilinear and volumetric surfaces with standard red-green-blue and high dynamic range (sRGB/HDR) formats will be slower compared to 32-bit cases.

Uncompressed Texture Formats

Uncompressed formats—sRGB and HDR—consume greater bandwidth. Use linear formats if the app becomes heavily bandwidth-bound.

We recommend that you compress textures in the geometry and use compression formats, such as DXT.x.

For example: In Unity* use the following screenshot (Project > Models > Textures) to compress textures:

VR content developer guide

For non-shared resources, to save on memory bandwidth, do not use the D3D11_RESOURCE_MISC_SHARED flag to enable compression on the resource to save on memory bandwidth.

BlendState

To avoid an unnecessary read/modify/write situation, the use of Render Target Write Mask  (D3D11_RENDER_TARGET_BLEND_DESC) for all the channels (RGBA) is encouraged instead of using individual color components. 

Lighting Mode

Lighting mode controls the lighting precomputation and composition. To save on computation during runtime, we recommend using Mixed or Baked mode instead of real time. For more information, refer to the Unity documentation: https://docs.unity3d.com/Manual/LightModes.html

HDR Formats

The use of R10G10B10A2 over R16G16B16A16 and floating point formats is encouraged.

Filtering Modes

Filtering modes, like anisotropic filtering, can significantly impact performance, especially with uncompressed formats and HDR formats.

Anisotropic filtering is a trade-off between performance and quality. Generally anisotropic level two is recommended based on our performance and quality studies. Mipmapping textures along with anisotropic levels add overhead to the filtering and hardware pipeline. If you chose anisotropic filtering, we recommend using bc1‒5 formats.

Anti-Aliasing

Temporally stable anti-aliasing is crucial for a good VR experience. Multisample anti-aliasing (MSAA) is bandwidth intense and consumes a significant portion of the rendering budget. Anti-aliasing algorithms that are temporally stable post-process can provide equivalent functionality at half the cost and should be considered alternatives.

For Applications using Media and 3D pipelines, we recommend that you disable the MSAA for media workloads, because it has no impact on media quality.

Low-Latency Preemption

Gen hardware supports object-level preemption, which usually translates into preemption on triangle boundaries. For effective scheduling of the compositor, it is important that primitives are able to be preempted in a timely fashion. To enable this, draw calls that take more than 1 ms should usually have more than 64‒128 triangles. Typically, full-screen post-effects should use a grid of at least 64 triangles as opposed to 1 or 2.

App Scheduling

  1. Recommendation: Nothing additional is required.

     Nothing additional is required

    In the ideal case for a given frame, the app will have ample time to complete its rendering work between the vsync and before the Late Stage Reprojection (LSR) packet is submitted. In this case, it is best that the app synchronize on the vsync itself, so that rendering is performed on the newest HMD positional data. This helps to minimize motion sickness.

  2. Recommendation: Start work earlier by syncing on the compositor starting work rather than on the vsync.

     Start work earlier by syncing on the compositor starting work rather than on the vsync

    When the frame rendering time no longer fits within this interval, all available GPU time should be reclaimed for rendering the frame before the LSR occurs. If this interval is not met, the compositor can block the app from rendering the next frame by withholding the next available render target in the swap chain. This results in entire frames being skipped until the present workload for that frame has finished, causing a degradation of the app’s frames per seconds (fps). The app should synchronize with its compositor so that new rendering work is submitted as soon as the present or LSR workload is submitted. This is typically accomplished via a wait behavior provided by the compositor API.

  3. Recommendation: Present asynchronously.

     Present asynchronously.

    In the worst case, when the frame rendering time exceeds the vsync, the app should submit rendering work as quickly as possible to fully saturate the GPU, allowing the compositor to use the newest frame data available, whenever that might occur relative to the LSR. To accomplish this, do not wait on any vsync or compositor events to proceed with rendering, and if possible build your application so that the presentation and rendering threads are decoupled from the rest of the state update.

    For example, on the Holographic API, pass DoNotWaitForFrameToFinish to PresentUsingCurrentPrediction, or in Microsoft DirectX*, pass SyncInterval=0 to Present.

  4. Recommendation: Present asynchronously.

    Use GPU analysis tools, such as GPUView, to see which rendering performance profile you have encountered, and then make the necessary adjustments detailed above.

Other design considerations

Half float versus float: For compute-bound workloads, half floats can be used to increase throughput whenever precision is not an issue. Mixing half and full resolution results in performance penalties and should be minimized.

Design Considerations for Immersive Media Applications

Decode/Video Post-Processing (VPP)

  1. Recommendation: Use fixed-function (FF) decode

    Based on the GPUView analysis, we recommend using FF decode to achieve performance and power gains as compared to software decode.

    Programming guideline:

    https://www.x.org/docs/intel/CHV/intel-gfx-prm-osrc-chv-bsw-vol08-media-vdbox.pdf

  2. Recommendation: Color space conversion (CSC)

    Leverage VPP FF for converting decoded output from NV12 to RGB

    Programming guideline:

    https://01.org/sites/default/files/documentation/intel-gfx-bspec-osrc-chv-bsw-vol09-media-vebox.pdf

  3. Recommendation: Decode fps

    Perform CSC in sync with decoded content fps to save on power and performance.

    The following tools will help you identify issues with VR workloads.

Tools

The following tools will help you identify issues with VR workloads.

GPUView: GPUView is a tool in the Microsoft Windows* Performance Toolkit that is installed by the Windows software development kit. GPUView provides specifics on identifying issues with scheduling and dropped frames.

https://docs.microsoft.com/en-us/windows-hardware/drivers/display/using-gpuview

Intel® Graphics Performance Analyzers: Gives specifics on analyzing VR workloads and the expected patterns we see, for example, two sets of identical calls for the left and right eyes.

Additional Resources

A Graphics API Developer Guide for 6th Generation Graphics Processors: https://software.intel.com/en-us/articles/6th-gen-graphics-api-dev-guide

A Unity Optimization Guide for Processor Graphics: /content/www/us/en/develop/articles/unity-optimization-guide-for-x86-android-part-1.html

Compute Architecture for 6th Generation Graphics Processors: https://software.intel.com/sites/default/files/managed/c5/9a/The-Compute-Architecture-of- Intel-Processor-Graphics-Gen9-v1d0.pdf

Summary

The biggest challenge for VR workload performance comes from being bandwidth-bound. The texture format, fusing shader passes, and using post anti-aliasing techniques help reduce the pressure on bandwidth.