|
In this section, we discuss some of the details of Intel® VT architecture. We first describe the VT-x support
for IA-32 processor virtualization [6], and then we describe the VT-i support for Itanium® processor
virtualization [7].
VT-x Architecture overview
VT-x augments IA-32 with two new forms of CPU operation: VMX root operation and VMX non-root operation. VMX
root operation is intended for use by a VMM, and its behavior is very similar to that of IA-32 without VT-x.
VMX non-root operation provides an alternative IA-32 environment controlled by a VMM and designed to support
a VM. Both forms of operation support all four privilege levels, allowing guest software to run at its
intended privilege level, and providing a VMM with the flexibility to use multiple privilege levels.
VT-x defines two new transitions: a transition from VMX root operation to VMX non-root operation is called a
VM entry, and a transition from VMX non-root operation to VMX root operation is called a VM exit. VM entries
and VM exits are managed by a new data structure called the virtual-machine control structure (VMCS). The
VMCS includes a guest-state area and a host-state area, each of which contains fields corresponding to
different components of processor state. VM entries load processor state from the guest-state area. VM exits
save processor state to the guest-state area and then load processor state from the host-state area.
Processor operation is changed substantially in VMX non-root operation. The most important change is that
many instructions and events cause VM exits. Some instructions (e.g., INVD) cause VM exits unconditionally
and thus can never be executed in VMX non-root operation. Other instructions (e.g., INVLPG) and all events
can be configured to do so conditionally using VM-execution control fields in the VMCS.
Guest-state area
The guest-state area of the VMCS is used to contain elements of the state of virtual CPU associated with that
VMCS.
For proper VMM operation, certain registers must be loaded by every VM exit. These include those IA-32
registers that manage operation of the processor, such as the segment registers (to map from logical to
linear addresses), CR3 (to map from linear to physical addresses), IDTR (for event delivery), and many
others. The guest-state area contains fields for these registers so that their values can be saved as part of
each VM exit.
In addition, the guest-state area contains fields corresponding to elements of processor state that are not
held in any software-accessible register. One of these elements is the processor's interruptibility state,
which indicates whether external interrupts are temporarily masked (e.g., due to execution of the MOV-SS
instruction) and whether non-maskable interrupts (NMIs) are masked because software is handling an earlier
NMI.
The guest-state area does not contain fields corresponding to registers that can be saved and loaded by the
VMM itself (e.g., the general-purpose registers). Exclusion of such registers improves the performance of VM
entries and VM exits. Software can manage these additional registers more efficiently as it knows better than
the CPU when they need to be saved and loaded.
VM-Execution control fields
The VMCS contains a number of fields that control VMX non-root operation by specifying the instructions and
events that cause VM exits. In this section, we present some of these controls.
The VMCS includes controls that support interrupt virtualization:
-
External-interrupt exiting. When this control is set, all external interrupts cause VM exits; in
addition, the guest is not able to mask these interrupts (e.g., interrupts are not masked if EFLAGS.IF=0).
-
Interrupt-window exiting. When this control is set, a VM exit occurs whenever guest software is ready
to receive interrupts (e.g., when EFLAGS.IF=1).
-
Use TPR shadow. When this control is set, accesses to the APIC's TPR through control register CR8
(available only in 64-bit mode) are handled in a special way: executions of MOV CR8 access a TPR shadow
referenced by a pointer in the VMCS. The VMCS also includes a TPR threshold; a VM exit occurs after any
instruction that reduces the TPR shadow below the TPR threshold.
There are also VM-execution control fields that support efficient virtualization of the IA-32 control
registers CR0 and CR4. These registers each comprise a set of bits controlling processor operation. A VMM may
wish to retain control of some of these bits (e.g., those that manage paging) but not others (e.g., those
that control floating-point instructions). The VMCS includes, for each of these registers, a guest/host mask
that a VMM can use to indicate which bits it wants to protect. Guest writes can freely modify the unmasked
bits, but an attempt to modify a masked bit causes a VM exit. The VMCS also includes, for each of these
registers, a read shadow whose value is returned to guest reads of the register.
To support VMM flexibility, the VMCS includes bitmaps that allow a VMM selectivity regarding the causes of
some VM exits. The following items detail three of these:
-
Exception bitmap: This field contains 32 entries for the IA-32 exceptions. It allows a VMM to specify
which exceptions should cause VM exits and which should not. For page faults, further selectivity is
supported based on a fault's error code.
-
I/O bitmaps: These bitmaps contain one entry for each port in the 16-bit I/O space. An I/O
instruction (e.g., IN) causes a VM exit if it attempts to access a port whose entry is set in the I/O
bitmaps.
-
MSR bitmaps: These bitmaps contain two entries (one for read, one for write) for each model-specific
register (MSR) currently in use. An execution of RDMSR (or WRMSR) causes a VM exit if it attempts to read (or
write) an MSR whose read bit (or write bit) is set in the MSR bitmaps.
In addition to the controls mentioned above, there are VM-execution controls that support flexible VM exiting
for a number of privileged instructions.
VMCS Details
Like the IA-32 page tables, each VMCS is referenced with a physical (not linear) address. This eliminates the
need to locate the VMCS in the guest's linear-address space (which, as noted below, may be different from
that of the VMM). The format and layout of the VMCS in memory is not architecturally defined, allowing
implementation-specific optimizations to improve performance in VMX non-root operation and to reduce the
latency of VM entries and VM exits. VT-x defines a set of new instructions that allows software to access the
VMCS in an implementation-independent manner.
Details of VM entries and VM exits
As noted earlier, VM entries load processor state from the guest-state area of the VMCS. (Note that, because
the state loaded includes CR3, the guest may run in a different linear-address space than the VMM.) In
addition to loading guest state, VM entry can be optionally configured for event injection. The CPU effects
this injection using the guest IDT to deliver an event (exception or interrupt) specified by the VMM, just as
if it had actually occurred immediately after VM entry. This feature removes the need for a VMM to emulate
delivery of these events.
As noted above, VM exits save processor state into the guest-state area and then load processor state from
the host-state area. (Again, because the state loaded includes CR3, the VMM may run in a different linear-address
space than the guest.) This implies that all VM exits use a common entry point in the VMM. To
simplify the design of a VMM, VT-x specifies that each VM exit save into the VMCS detailed information on the
cause of the VM exit. Every VM exit records an exit reason (specifying, for example, which instruction caused
the VM exit); many also record an exit qualification, which provides further details. For example, if a VM
exit is caused by the MOV CR instruction, the exit reason would indicate "control-register access" and the
exit qualification would identify the following: (1) the specific control register (e.g., CR0); (2) whether
the MOV was to or from the register; and (3) which other register was the source or destination of the
instruction.
Each VM exit due to an IA-32 exception saves, in addition to information about the exception, information
about any event (e.g., an external interrupt) that was being delivered at the time the exception occurred.
This allows a VMM to virtualize nested exceptions properly.
VT-i Architecture overview
VT-i expands the Itanium architecture with extensions to the processor hardware and the Processor Abstraction
Layer (PAL) firmware.
VT-i adds a new PSR bit (PSR.vm) that allows guest OSs to be run at the privilege level for which they were
designed and creates interceptions to a VMM necessary for the creation of a complete VM. The VMM runs with
this bit equal to zero and runs guest software with this bit equal to one.
The PSR.vm bit modifies the behavior of all privileged instructions as well as that of some non-privileged
instructions that access state that a VMM may want to control (including the thash, ttag, and mov cpuid
instructions). When a guest OS executes one of these instructions a virtualization intercept is caused which
transfers control to the VMM with the PSR.vm bit set to zero.
PSR.vm is orthogonal to the privilege level. This fact allows guest software to run at its designated
privilege level; if desired, a VMM can span multiple privilege levels.
PSR.vm also controls the number of virtual-address bits available to software. When a VMM is running (PSR.vm
= 0), all implemented virtual-address bits are available. When a guest is running (PSR.vm = 1) the uppermost
implemented virtual-address bit is not available and unimplemented data/instruction address faults or
unimplemented instruction address traps are created if this bit is used. This provides a VMM a dedicated
address space that guest software cannot access.
VT-i also includes a number of additions to the PAL firmware layer. These additions provide a consistent
programming interface to a VMM even if the hardware is not implemented identically across processor
generations. These PAL extensions include a set of new procedures; the addition of PAL services for high-frequency
VMM operations; and a virtual processor descriptor (VPD) table.
The PAL procedures are used for setting up and tearing down a VM environment; for setting global VMM
configuration options; for initializing and terminating virtual processors; and for saving and restoring a
subset of state of a virtual processor. These procedures follow the same calling convention as existing PAL
procedures. In addition, a new PAL interface called a PAL service has been introduced for virtualization. PAL
services reduce overhead through use of a new calling convention specifically targeted for use by a VMM. PAL
services provide functionality to synchronize guest hardware registers and the VPD; to save and restore a
subset of the state of a virtual processor; to resume execution of the guest software after a virtualization
intercept; to calculate guest VHPT hashes and tags; and to set up pending interrupts for the guest.
The VPD table is located in memory selected by the VMM. It is usually located in the VMM's virtual-address
space and is accessed by both the PAL firmware and the VMM. The VPD contains configuration settings for the
virtual processor and a subset of the virtual processor's state that influences its execution
characteristics. For example, the virtual processor's control-register values are located in the VPD but not
its general registers. The layout of the VPD is architected to be 64K in size and includes reserved space
for future usage.
The VPD contains two configuration fields that allow the VMM to customize the virtualization environment:
-
Virtualization-acceleration field. This field allows the VMM to customize the virtualization of a
particular resource or instruction, leading to a reduction in the number of virtualization intercepts that
the VMM has to handle. It provides accelerations for external-interrupt handling as well as intercept
control for reads and writes to interruption control registers (cr16-cr25), reads of the PSR, reads of CPUID,
the cover instruction, and the bank-switch instruction (bsw).
For example, a VMM could enable the bank-switch optimization. Guest execution of bsw would use values that
the VMM had set up in the VPD for the guest OS and would never cause a virtualization intercept to the VMM.
-
Virtualization-disable field. This field allows the VMM to disable virtualization of a particular
resource or instruction, leading to a reduction in the number of virtualization intercepts the VMM handles.
This field provides disables for virtualization of the external interrupt control registers (cr6571), the
performance monitoring registers, the debug registers, the PSR.i bit, and the interval timer match register.
To provide efficient handling of virtualization intercepts for a VMM, the architecture has added two new
vectors into the IVT:
-
Virtualization vector. This vector is used for all virtualization-related intercepts. To reduce
decoding complexity, a VMM can configure the processor to provide the cause of the virtualization intercept
(a bitmap field of intercepting instructions) as well as the faulting opcode in two of the processor banked
registers. A VMM can relocate this handler to a memory location outside the IVT as well through a PAL
interface.
-
Virtual external interrupt vector. The processor uses this vector when the guest unmasks a pending
external interrupt. It would be used when the VMM has a virtual interrupt for the guest that it cannot
deliver due to guest masking. When the guest performs an operation to unmask the highest pending interrupt,
the guest state is updated and control is transferred to this new vector. This streamlines delivery of guest
external interrupts for the VMM.
VT-i also provides global configuration options that a VMM can set that apply to all virtual processors
activated by the VMM. These global configuration options determine whether the cause of a virtualization
intercept is provided, if the opcode of the instruction causing the virtualization intercept is provided, if
the performance counters are frozen for all virtualization intercepts, and the byte order (or endianness) of
the date located in the VPD.
VT-i also includes the vmsw instruction. This instruction transitions the PSR.vm bit with minimum overhead.
This can reduce transition overhead between guest software and a VMM in cooperative virtualization
environments.
|