VA-API VP9 K-SVC encoding on ChromeOS

Zhaoliang MA, vaibhav k shankar, Jianlin Qiu, Jianhui Dai, Prashant Babu Kodali, Lei Zhai

Chromebooks powered by an Intel® SoC FPGA typically provide excellent power and performance. However, Google Meet cannot leverage the hardware video coding capabilities on Intel® architecture because recently Google Meet switched the encoding scheme from VP8 Simulcast to VP9 K-SVC (Key-Frame Scalable Video Coding). Our work aims to provide a better power and performance experience by enabling the hardware VP9 K-SVC encoding for Google Meet on ChromeOS running on Intel architecture. This article introduces the details and power gains of enabling the feature on Chromebooks powered by Intel.

Introduction

In multi-party video conferences, the encoded bitstream may have multiple recipients, each of which may have different available bandwidths. Simulcast and SVC are two methods to tailor the encoded bitstreams to different network bandwidths for receivers. In the Simulcast case, multiple encoded bitstreams produced by the encoder are independent. Unlike Simulcast, SVC generally offers higher-efficiency video coding compared with Simulcast due to the spatial scalability with inter-layer prediction. As for K-SVC , since the spatial inter-layer dependencies are only used for the key frames, the encoding and bitrate control algorithms of K-SVC are less complex than that of full SVC. Google Meet also chose to use K-SVC in its implementation.
Since the Chromium video encoding accelerator doesn’t support K-SVC encoding, Google Meet cannot leverage the coding capabilities of Intel hardware. The package power consumption and CPU usage have jumped higher due to the fallback of software libvpx encoding, which degrades the user experience. The power efficiency is expected to improve when hardware K-SVC encoding is enabled.

Motivation

We come from the Web Media team. We focused on the power and performance optimization for Web Media workloads on Chromium, especially for real-time video conferencing, such as Google Meet. We expect to achieve the best battery life and user experience for video conferencing workloads on Intel architecture platforms.

Approach

There are multiple types of scalability:

Temporal scalability are layers whose frame rate is lower than that of the upper layer(s)
Spatial scalability are layers whose resolution is lower than that of the upper layer(s)
Quality scalability are layers whose encoding qualities are lower than that of the upper layer(s)

WebRTC supports both temporal scalability and spatial scalability for VP9 and AV1. The WebRTC SVC spec defines many different types of scalability modes. As we mentioned before, K-SVC has lower complexity for encoding and bitrate control than full SVC, so Google Meet chose to use K-SVC. Google Meet always uses three temporal layers and the number of spatial layers could be initialized based on incoming frame resolution. That means the scalability mode used by Google Meet could be LxT3_KEY, where x = 1, 2, or 3. The number of active spatial layers could also be changed during the video conferencing call based on the real time available network bandwidth.

Take the most commonly used L3T3_KEY scalability mode as an example. The diagram below shows the dependency relationship between different spatial and temporal layer frames.

Here, SLx-TLy-z denotes the frame is at the x-th spatial layer and the y-th temporal layer, and the last z stands for the picture number. Note that cross-spatial-layer dependency only occurs at key pictures in K-SVC.
Hardware CBR (HuC based) won’t work with advanced encoding models, such as SVC, because Intel VA-API driver’s implementation at CBR mode doesn’t support a bitrate control to each spatial and temporal layer. We need to use software-based BRC (BitRate Control) to calculate the QP for each temporal and spatial layer frame, then feed it to the hardware encoder that runs in CQP mode. The diagram below shows how the SW BRC + CQP mode encoding works. (For more information see Bitrate Control Methods (BRC) in Intel® Media SDK.)

VP9SVCLayers manages a state of K-SVC encoding with up to three spatial and temporal layers, which controls temporal and spatial layer structures (defined by LxT3_KEY, where x = 1, 2, or 3) and fills metadata for each temporal and spatial layer frame. VP9SVCLayers also supports activating and deactivating spatial layers, while the temporal layer sizes must be unchanged. The diagram below shows the flow of SVC encoding in a VA-API (Video Acceleration API) video encoder implementation.

Figure 3: SVC encoding in a VA-API implementation

Performance improvement

Power and performance statistics on Tiger Lake

:
The advantages of enabling K-SVC hardware encode versus software, which can be seen from the SOC power data below, is as follows:

Approximately a 1W package power reduction (note that power can vary based on the platform used):
- A 5-10% reduction in SOC PC0 residencies, due to GPU offload.
- CPU Cores are able to go to deeper CCx states. An increase in CC7 residencies was also observed.
- An improvement in GPU utilization was observed due to GPU offload (encode) from CPU.
- As the resolution of encode/decode increases (720p), GPU does a better job in encoding and decoding efficiently with lower power and latency.
- DDR BW is lower by ~500MB/s with hardware offload, lowering the memory controller/phy power consumption.
- CPUs are able to run at lower frequencies, approximately ~300 to 400MHz lower due to offload. This also helps reduce core power consumption.

After about four months of finch testing, this feature was enabled by default on Tiger Lake devices after the Chrome M97 branch.

Power metrics plots

Considering Google Meet SW encode to be the baseline (100%), we can see the reduction in various SOC power metrics below:

Conclusion and future work

Ensuring that popular web media workloads like Google Meet have the best power and performance behavior on Intel architecture is the goal we have been pursuing. This paper outlines the experiments carried out in enabling the hardware VP9 K-SVC encoding for ChromeOS on Intel architecture, which make Google Meet achieve better performance on Intel Chromebooks, and extremely extended the battery life for video conferencing scenarios. Video coding technology is developing continuously, in the following Intel architecture we still need to make sure that the hardware VDBox capabilities will be leveraged for AV1 and AV1 SVC in the web media field on Intel architecture.

References

1. Chromebooks Powered by Intel: https://www.intel.com/content/www/us/en/products/systems-devices/laptops/chromebooks.html
2. Scalable Video Coding (SVC) Extension for WebRTC: https://www.w3.org/TR/webrtc-svc/
3. Google Meet: https://meet.google.com/
4. VA-API (Video Acceleration API): https://01.org/linuxmedia/vaapi
5. Bitrate Control Methods (BRC) in Intel® Media SDK: https://www.intel.com/content/www/us/en/developer/articles/technical/common-bitrate-control-methods-in-intel-media-sdk.html

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Support VA-API VP9 K-SVC Encoding on ChromeOS for Intel® Architecture