|
A question that may come to mind is "Why can't existing performance characterization methods be exploited?"
Users already have many accepted methods and tools to characterize servers. Some of these include load
generators (e.g., LoadRunner*) and a myriad of industry standard (e.g., SPEC*, TPC*) and proprietary
workloads (e.g., SAP-SD 2 Tier*, MMB3*, R6iNotes*). There are several challenges presented in virtualization
performance characterization including consolidation, virtualization, and implementation considerations.
These limit the use of existing methods.
Consolidation Characterization Challenges
We need to differentiate between consolidation and virtualization challenges, as both introduce complexity
into performance measurement and tuning. Virtualization facilitates creating multiple VMs on one physical
machine. Consolidation relates to running multiple workloads on the system at the same time.
A challenge with consolidation characterization is the mixture of different workloads. If you consolidate a
set of heterogeneous workload environments, consider that each will have a different set of requirements and
metrics and that depending upon the users' specific requirements, the relative priority of each will vary
across users, time, and other dimensions.
Another consolidation challenge relates to resource profiles. The non-steady state resource profile of the
individual servers will look quite different from that of the consolidated system [4]. It is simplest to
measure performance when all measurements are conducted in a time window after all workloads are in a steady
state. While this may be nice for a benchmark, it fails to represent many real-world usage models. Consider
the following examples:
-
Most e-mail servers have distinct periods where the demands upon them vary a great deal. For example, the
system may be idle until a wave of people arrive at work and log in, download their e-mail, and make other
demands on the server. Conversely, the demands on the server would decrease as people finish up for the day.
-
A Web store that supported a worldwide customer base could be busy 24x7 and reach a steady state as
opposed to some service that was provided to people in one locale.
-
Some workloads have seasonal variations, end-of-month closings, holiday duty cycles, and other
modifications that may differ greatly from normal operations.
Consider two examples of a consolidated e-mail server: Web store server, and a customer relationship
management (CRM) server. In the first scenario, Figure 1, we see that none of these is ever run in a steady
state. If these are the actual profiles of the consolidated server, it would be prudent to examine peak
resource requirements when superimposed at one instant of time to determine how well the overall system is
performing.

Figure 1: System resource profile for workloads that are not operating in a steady state
click image for larger view
In the second example, Figure 2, we see three server utilization profiles that all reach a steady state. If
we were to examine the performance of the consolidated system and did our study at some point after 15 hours
of running, we would see a much simpler profile of the workloads in a steady state. While the second workload
is easier to test and tune, it may not reflect the actual end-user resource profile.

Figure 2: System profile for workloads that are operating in a steady state
click image for larger view
Virtualization Characterization Challenges
Whatever performance tools, methods, and processes are used for the characterization, tuning and simulation
of server-based workloads they are likely to continue to be relevant in a virtualized environment. As much as
we would like to have a single benchmark (or a small set of benchmarks) to describe server performance, there
is nothing as good as the actual end-user workload (what they do today and how that will change over time) to
employ in developing a performance and projection discipline. This will also be true for virtualization
performance, since no single workload will characterize all user requirements. Consider some different user
requirements which may include the following:
-
A threshold minimum throughput must be maintained over time.
-
Some margin must be available for peak workload requirements or for future expansion.
-
The server provides some service and the response time to any specific request or set of requests cannot
exceed some specified quality-of-service threshold.
We can better understand some of the new challenges that are introduced in the context of virtualization when
we consider the requirements for and how a system will be used. For our example, server environments are
associated with the consolidation of existing (often legacy) systems and the virtual partitioning of an
existing platform for new server deployments. The diversity of what is being consolidated requires that no
one workload or environment can be used as a general proxy for (most) others. Consider the diversity of
essential components (and how poorly one workload would serve as a proxy for another):
-
One OS as a proxy for all others (e.g., Windows* to represent Linux*)
-
One usage model or vertical to represent another (Linpack to represent MMB3)
Listed below are some of the other new challenges virtualization adds:
-
There are many different options on how a platform will be partitioned and how resources can be
allocated, each dramatically affecting the performance of each and how they interact with each other. These
are further compounded depending upon what the goals are: for example, absolute performance, minimum
performance thresholds, power consumption, TCO, or other optimization criteria.
-
There are different strategies that can be used to evaluate a system, including response time,
throughput, percentage utilization, and others. These may be exploited simultaneously across separate
workloads running across different VMs in a single performance discipline.
Implementation Challenges
As different software stacks are combined inside a set of VMs, there are considerations that may affect
precision and repeatability of the results. Often these are tied to the specific implementation of the
virtualization abstraction layer and underlying platform. Though by no means an exhaustive list, such issues
could include the following:
-
VM clock accuracy/precision: Since there are several VMs running on a single platform, there is a variety
of approaches to how the virtual clock is mapped to the physical platforms' clock, and any of these can cause
clock skew. Since most benchmarks will compute a performance metric based upon the always assumed correct
system clock, any changes in the clock behavior could lead to errors in computing the delivered performance.
Such issues, as well as ways to minimize this possibility, are further explored in VMware [5].
-
While an extensive set of system performance monitors are available under most native operating systems
(OSs), most virtualization monitors provide only the most basic performance monitoring capabilities. This is
sure to improve over time, but the combination of the environment getting more complicated from both
consolidation and virtualization and the nascent state of performance monitoring conspire to increase the
difficulty to comprehend and productively tune the system.
-
All virtualization implementations introduce an additional level of abstraction and not unexpectedly,
additional overhead. This makes appropriate system configuration even more important than it is for
unvirtualized environments, since resource limitations usually drive up the context switching rates, perhaps
at multiple levels of abstraction. Being more generous with memory and I/O capacity when setting up the
system initial configuration in a virtualized environment can offer an even larger return in performance and
price/performance than non-virtualized environments. As a simple example, a reduction in page fault activity
after adding some RAM in a virtualized environment is likely to pay an even larger dividend than in the pre-
virtualized environment.
-
Many unvirtualized server benchmarks will have a range of observed performance. When multiple workloads
are consolidated on a platform and hosted in VMs, this likely adds more variation, particularly if any of the
constituent workloads can impact each other or are tested before they are running in a steady state. Readers
are encouraged to run their experiments as many times as is necessary to understand the performance profile
and variation from run to run.
-
Obtaining consistent and predictable performance results assumes that scheduling across VMs is equitable
and consistent. It is possible in a virtualization benchmark that the scheduler is not providing what appears
to be an equitable distribution of compute and I/O resources across the VMs. For example, if you had N
identical copies of a particular workload with the same virtualization monitor configurations, you would
expect each to get 1/Nth of the resources available on the system. It is suggested that performance analysts
inspect the system during benchmarking to ensure that expected resource profiles are observed.
-
Some virtualization monitors will give you various options to map physical CPUs to virtual CPUs and to
create affinity between certain sets or to allow a more general pool of resources to be shared amongst all
VMs. Virtualization monitors may also permit the setting of weights or CPU percentage to each workload. The
higher the workload's weight, the more it will be scheduled to use CPU resources. How to set these depends
upon user requirements. For example, is it desirable to ensure that some CPUs are dedicated to certain
workloads, or do you want the flexibility for the VM to allocate CPUs based upon dynamic workload changes in
real-time? Is one of the workloads more important than others and therefore should a bigger weight be
assigned to it?
When consolidating multiple workloads on a single physical platform, a number of physical devices need to be
shared between VMs. Some platforms and virtualization monitors provide different options on how to map the
physical devices to virtualized devices. Some physical devices can be assigned solely to a specific workload
or just shared between a set of VMs. It depends on customer requirements to set the options. For example,
customers can decide to assign a NIC to a Web-bound workload exclusively, and all other more compute bound
workloads will share another NIC.
|