Increase the Performance of VM Workloads by Enabling Transparent Huge Page

ID 749114
Updated 9/30/2022
Version Latest
Public

author-image

By

Introduction

A virtual machine (VM) is a compute resource that uses software instead of a physical computer to run programs and deploy apps. One or more virtual “guest” machines run on a physical “host” machine. Chromebooks support Android OS, Windows OS, and Linux OS as virtual machines in ChromeOS. These deployments are sandboxed, hence provide additional security for applications running in ChromeOS. Crosvm is the ChromeOS virtual machine manager written from the ground-up to run lightweight, secure, and performant VM’s.
In a virtualized environment, the performance of the workload is comparatively lower than with native performance. In the above VM environment, the average performance gap was about 5% to 20% on various workloads compared to native. One of the areas we looked at to improve the performance and reduce the native gap is reducing the number of VM exit calls that have significant CPU overhead. In Android and Crostini (Linux) VM, VM exits are significantly high for extended page table (EPT) violation. In the case of EPT violation, in addition to VM exit overhead, context switching and instruction translation look-aside buffers (ITLB) overhead contribute to the performance degradation.
In this work, we have supported transparent huge pages [1] (THP) to reduce the VM exits due to EPT violation, and in turn improve the performance by reducing iTLB overhead.
Our assessment results exhibit that our implementation has achieved approximately a 3% performance improvement, approximately a 70% reduction in EPT violation VM exits for Android and Crostini workloads, and an average 12% reduction in iTLB overhead for ARCVM (Android OS in VM) workloads using the THP. There was no impact to Chrome browser and native workloads, and no degradation was observed in power.

Motivation

Background – VM exits [2]

Processor support for virtualization is provided by a processor operation called VMX transition. There are two kinds of VMX transitions. Transitions into VMX non-root operations are called VM entries. Transitions from VMX non-root operation to VMX root operation are called VM exits. Processor behavior in VMX root operation is similar to behavior outside VMX operation. Processor behavior in VMX non-root operation is restricted and modified to facilitate virtualization. Instead of their ordinary operation, certain instructions (including the new VMCALL instruction) and events cause VM exits to the VMM. Because these VM exits replace ordinary behavior, the functionality of software in VMX non-root operation is limited

Figure 1: VM Root and Non-Root Operation

 

VM exits in response to certain instructions and events (such as page fault) are a key source of performance degradation in a virtualized system. A VM exit marks the point at which a transition is made between the VM currently running and the VMM (hypervisor) that must exercise system control for a particular reason. In general, the processor must save a snapshot of the VM's state as it was running at the time of the exit. Refer to Figure 2 below.

Figure 2: VM exits and VM enter flow

 


VMExit - EPT violations [3]

EPT is hardware support for mapping guest physical addresses (GPA) to host physical addresses (HPA). Before EPT support was introduced, hypervisors had to manually maintain a shadow copy of the guest page table (GPT) mappings entries.
The page table entries in the actual guest page table would have lowered access permissions; for example, if its actual permission was write, it would be lowered down to read. This results in a page fault, which would be intercepted by the hypervisor. The hypervisor in turn updates the corresponding shadow page table entries. This entire process is very resource intensive. EPT was introduced so that GPA to HPA translation is done by the hardware itself because it is much faster.
The guest virtual address (GVA) is translated normally by the hardware by traversing the page tables in the guest OS as it would have been done in an OS running on native hardware. Once we get the guest physical address (GPA) after doing this translation, EPT comes into the picture. Now, hardware translates GPA to HPA as HPA are the addresses the real CPU knows about.
An EPT violation VM exit happens when EPT does not have an existing mapping for a GPA to HPA. This results in a VM exit to VMM, which then creates a new mapping. The EPT violation is the same as a page fault in normal OS, with the only difference being the type of mapping being created. Refer to Figure 3.

Figure 3: EPT Violation: VM Exit and VM Entry



iTLB overhead [4]

In x86 architectures, mappings between virtual and physical memory are facilitated by a page table, which is kept in memory. To minimize references to this table, recently used portions of the page table are cached in a hierarchy of translation look-aside buffers, or TLBs, which are consulted on every virtual address translation. As with data caches, the farther a request must go to be satisfied, the greater the performance impact. This metric estimates the performance penalty of page walks induced on instruction translation look-aside buffers (ITLB) misses. When the page table is invalidated, i.e. a VM exit to VMM, it creates new mappings and iTLB overhead increases, as shown in Figure 4.

Figure 4: iTLB Overhead


Results

Our evaluation platform is an Asus Chromebook, using an 11th Gen Intel® Core™ i7-1165G7 @ 2.80GHz processor with 8 cores. The processor base frequency is 2.8 GHz and can reach up to 4.7 GHz in Turbo mode. The memory available in the device is 16 GB. ChromeOS version R93 with Android R is loaded in the device. We have ensured that Internet Speed Test is executed before collecting the data to confirm the internet bandwidth is the same during the execution of the tests. The apps are side loaded to the system and tests are applied. In our first level analysis, we did micro architecture analysis and observed significant iTLB overhead in ARCVM compared to ARC++.

Table 1: iTLB Overhead Native vs ARCVM (Android OS in VM)
SL No. App Name Native - iTLB Overhead  ARCVM - iTLB Overhead
1 Wild Hunt: Sport Hunting Game (ARM) – compile time 7.20% 12.70%
2 Wild Hunt: Sport Hunting Game (ARM) – processing time 7.00% 12.90%
3 Riptide (ARM) – processing time  9.10% 19.80%
4 Asphalt 8 (x86) - compile time 7.10% 13.40%
5 Asphalt 8 (x86) – processing time  12.20% 15.10%
6 Zoom (x86) – processing time 8.10% 16.50%


We verified that the system is using 4KB pages (default). We enabled THP, which uses 2MB huge pages in the kernel and then made changes to Crosvm to utilize THP. With these changes, we obtained significant performance gains, as shown in Graph 1 below:

Graph 1: Performance gain with THP optimization 4K vs THP

 

Graph 2: Reduction of VM Exits due to EPT Violations data 4K vs THP

 

Graph 3: iTLB Miss Data 4K vs THP

 

As we increase the page size from 4KB to 2MB, memory pressure may be created when we try to run multiple workloads on the host and guest side. So, we verified the concurrent behavior of workloads to check if there is any performance difference.
80 browser tabs (browser tabs (20 tabs each): YouTube (streaming); Amazon (ecommerce); Yahoo (web); Flipkart (ecommerce)
 + Webxprt3 (browser benchmark) + PC Mark: Work2.0 (ARCVM: CPU & GPU)+ GeekBench 5.3.1 (Crostini : CPU)

Graph 4: Concurrent Workloads Results

 

Summary

In this paper we outlined the VM performance gap compared to native and overheads due to VM exits. In addition, we published the performance and micro architecture impact by enabling THP. Our implementation improved performance by approximately 3% and reduced EPT violation VM exits for Android and Crostini workloads by approximately 70%. The implementation also reduced ITLB overhead for ARCVM (Android OS in VM) workloads using the THP an average of 12%
There was no Impact to Chrome browser and native workloads, and no degradation was observed in Power. These optimizations were accepted by Google and merged in ChromeOS. Also, this technique can be used in different VM environments in current and future generations of platforms for performance gains and reduction of iTLB overheads.

About the authors

This article was written by Jaishankar Rajendran, Prashant Kodali and Biboshan Banerjee.
Jaishankar Rajendran and Prashant Kodali is a member of the CCG CPE Chrome and Linux Architecture Department and Biboshan Banerjee is a member of the Android Ecosystem Engineering Department
Thanks to Shyjumon N, Sajal K Das, Erin Park, Mahendra K Reddy and Vaibhav Shankar for their support and guidance in reviewing this article.

Notices and disclaimers

Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure. Your costs and results may vary. Intel technologies may require enabled hardware, software or service activation.
© 2021 Intel Corporation. Intel, the Intel logo, Intel Core, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

Test configuration

Software: Android 11.0, OpenGL ES 3.1 Mesa Support
Hardware: Asus Chromebook, Intel® Core™ i7-1165G7 processor, 4 Core 8 Threads, 16 GB Ram
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at www.intel.com.

References

[1]     Transparent huge page: https://www.kernel.org/doc/Documentation/vm/transhuge.txt
[2]     Life cycle of VMM software: https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3c-part-3-manual.pdf
[3]     EPT violation: https://stackoverflow.com/questions/29651740/intel-ept-table-is-4-level-page-table#:~:text=Ept%20violation%20VMExit%20happens%20when,then%20create%20a%20new%20mapping