|
The Intel® LVMM architecture was designed with several goals in mind: maximize performance, have low complexity,
maintain user experience, and provide an isolated execution environment for management applications that are always
accessible and active.
High performance
The Intel LVMM architecture was designed to maximize performance. As a result, the VMM itself virtualizes only the
minimal set of devices required for allowing two distinct execution environments to execute concurrently, e.g.,
interrupt controllers and system timers. The LVMM allows the user partition direct access to most of the devices and
therefore does not intercept I/O accesses made to those devices. This minimizes the overhead incurred by the LVMM on the
user partition.
The network traffic of the user partition is handled by the services partition. The architecture depicted in Figure 4
shows that the network traffic flows through a physical NIC driver in the services partition, a bridge driver that
routes the packets between the services partition network stack and the user partition network stack. In the user
partition, a virtual NIC driver is responsible for sending all outgoing packets from the user partition to the bridge
driver. The bridge driver forwards them to the physical NIC driver which in turn sends on the wire. Incoming packets are
forwarded by the physical NIC driver to the bridge driver. The bridge driver forwards the incoming packets to the
virtual NIC driver which in turn forwards them up the user partition network stack. This networking architecture
provides a higher virtualization abstraction level. It performs better than a virtualization scheme that exposes a NIC
device model to the user partition. In this scheme all the user partition accesses to the NIC device need to be
intercepted and emulated.

Figure 4: Client VMM architecture
click image for larger view
The services partition has been modified to be aware of virtualization (paravirtualized). For example, the services
partition interrupt handling is performed using a generic interrupt controller that is greatly simplified compared to a
standard interrupt controller (e.g., Programmable Interrupt Controller a.k.a. 8259). The paravirtualization of the
services partition simplifies the interaction with LVMM and saves unnecessary context transitions between them.
Unmodified user experience
An important goal for the LVMM was to preserve the same user experience as that of a platform without the LVMM. The
following are design decisions that were made in order to achieve this goal.
The design of the networking architecture guarantees that it is transparent to the end user and that the NICs are
controlled by the services partition. All the functionality of the NICs controlled by the services partition, are
exposed to the user partition. The virtual NIC driver in the user partition acts as a proxy for the physical NIC driver
executing in the services partition. The bridge driver in the services partition provides the relay for packet data and
control information between the virtual NIC driver and the physical NIC driver.
The ACPI policy decisions for configuration and power management operations of the platform are owned by the user
partition. Any sleep state transition requested by the user partition is honored by the LVMM and services partition. For
example, if the end user wants to transition the platform into "stand by" mode (sleep state S3) to preserve
battery life, then the LVMM will eventually forward the request to the underlying platform. The user experience is
preserved with respect to system power management, system thermal management, battery life, and sleep state usage
models.
Isolated execution environment
In order to keep the partitions isolated from each other, there is a need to protect their physical memory from being
tampered with by another partition. Memory accesses are performed by the CPU (e.g., any move to memory instruction) and
are performed by devices through Direct Memory Access (DMA) operations. DMA allows a device with appropriate hardware to
directly access system memory for data transfer without the intervention of the CPU.
The LVMM and the services partition have to be protected from memory accesses performed by user partition code. The LVMM
needs to retain control over the physical memory, and thus over the processor's address-translation mechanism. We employ
VT-x to prevent intentional or unintentional memory accesses from the user partition that may compromise the services
partition or the LVMM.
The LVMM maintains an alternative page-table hierarchy that effectively caches translations derived from the hierarchy
maintained by the OSs running in the user and services partitions. VT-x provides the necessary hooks for the LVMM to
keep the alternative page-table hierarchy consistent with the OSs original page-table hierarchy. Such a hook is the trap
on CR3 change. CR3 points to the base of the page-table hierarchy. Each time the OS switches to a different page-table
hierarchy (i.e., changes the CR3 value), then the LVMM gets notified and switches to an alternative page-table hierarchy
that matches the new OS page-table hierarchy. Since the LVMM controls the actual page tables, it can prevent a situation
in which one partition has access to another partition's or the LVMM's physical memory. The LVMM prevents the existence
of virtual to physical translations that map physical pages that do not belong to the partition.
The LVMM and the services partition have to be protected also from DMA bus mastering devices mapped to the user
partition. These DMA-capable devices can access the entire system memory and can intentionally or unintentionally access
(read/write) memory pages hosting the LVMM and services partition code and data structures. Such accesses could
compromise IT secrets or render the platform useless by memory corruption. We employ Intel® VT for Directed I/O (VT-d) to
prevent such DMA-based attacks.
VT-d allows two views of the system memory: Guest Physical Address (GPA) and Host Physical Address (HPA). The LVMM keeps
the HPA view which is the same as the system physical address space. The user and services partitions are provided their
respective GPA views. The LVMM maintains shadow page tables to translate GPA to HPA for accesses from the CPU.
Similarly, using VT-d DMA remapping engines and corresponding translation tables, the LVMM maintains GPA-to-HPA mapping
for all DMA-capable I/O devices. Figure 5 illustrates this usage model.

Figure 5: VT-d usage model in the client VMM
click image for larger view
The mapping is performed as follows:
-
All services partition memory pages are added to one domain such that only DMA devices mapped to services partition
(NICs) can access these pages.
-
All remaining pages (except LVMM and BIOS reserved) are added to the user partition domain, and all devices except
those mapped to services partition can access these pages (e.g., iGFX, PCI/PCIe add-on cards etc.).
-
The LVMM and BIOS reserved regions are protected from DMA accesses by virtue of being absent from the VT-d
translation page tables.
The aforementioned device-to-domain mapping has the following benefits:
-
I/O devices mapped to one domain can't access the memory of another domain. For example PCI/PCIe add-on cards in
user partitions can't access the LVMM or the services partition.
-
Device drivers in the services and user partitions run without any changes to comprehend GPA-to-HPA mapping. This
translation is transparently performed by VT-d hardware when the device issues an I/O request using GPA.
If a device misbehaves by trying to access an address outside of the mapped domain, the VT-d hardware generates a fault.
This fault is captured by LVMM and is indicated to the services partition. An optional management application in the
services partition can process these faults by taking appropriate actions such as displaying an error message or
initiating a platform reboot, depending on the severity of the fault.
Always accessible and active
Management applications in the services partition are guaranteed connectivity with the external network allowing the
platform to be managed even when the user partition has been isolated. The NICs are controlled by the services
partition, and any action that the user partition attempts to make that can compromise the connectivity of the
management applications is blocked. For example, if an action on the user partition disables the NIC, then it will get
an indication that the NIC is disabled, although the real NIC remains enabled for use by the management applications.
This allows the services partition to be always accessible and reachable by a remote management console, so that
management actions can be initiated.
Moreover, VT-x allows the services partition to run in parallel to the user partition. This means that the services
partition is always active, and any diagnostics it runs can always be made available.
|