Technology and Research
Intel® Technology Journal Home
Volume 10, Issue 03
Intel® Virtualization Technology
Table of Contents
Technical Reviewers
About This Journal
Intel Published Articles
Read Past Journals
Subscribe
E-Mail this Journal to a Collegue
Main Visual Description
Intel Technology Journal - Featuring Intel's Recent Research and Development
Intel® Virtualization Technology
Volume 10    Issue 03    Published August 10, 2006
ISSN 1535-864X    DOI: 10.1535/itj.1003.07

  Section 4 of 11  
Redefining Server Performance Characterization for Virtualization Benchmarking
VIRTUALIZATION PERFORMANCE CHARACTERIZATION CHALLENGES

A question that may come to mind is "Why can't existing performance characterization methods be exploited?" Users already have many accepted methods and tools to characterize servers. Some of these include load generators (e.g., LoadRunner*) and a myriad of industry standard (e.g., SPEC*, TPC*) and proprietary workloads (e.g., SAP-SD 2 Tier*, MMB3*, R6iNotes*). There are several challenges presented in virtualization performance characterization including consolidation, virtualization, and implementation considerations. These limit the use of existing methods.

Consolidation Characterization Challenges

We need to differentiate between consolidation and virtualization challenges, as both introduce complexity into performance measurement and tuning. Virtualization facilitates creating multiple VMs on one physical machine. Consolidation relates to running multiple workloads on the system at the same time.

A challenge with consolidation characterization is the mixture of different workloads. If you consolidate a set of heterogeneous workload environments, consider that each will have a different set of requirements and metrics and that depending upon the users' specific requirements, the relative priority of each will vary across users, time, and other dimensions.

Another consolidation challenge relates to resource profiles. The non-steady state resource profile of the individual servers will look quite different from that of the consolidated system [4]. It is simplest to measure performance when all measurements are conducted in a time window after all workloads are in a steady state. While this may be nice for a benchmark, it fails to represent many real-world usage models. Consider the following examples:

  • Most e-mail servers have distinct periods where the demands upon them vary a great deal. For example, the system may be idle until a wave of people arrive at work and log in, download their e-mail, and make other demands on the server. Conversely, the demands on the server would decrease as people finish up for the day.
  • A Web store that supported a worldwide customer base could be busy 24x7 and reach a steady state as opposed to some service that was provided to people in one locale.
  • Some workloads have seasonal variations, end-of-month closings, holiday duty cycles, and other modifications that may differ greatly from normal operations.

Consider two examples of a consolidated e-mail server: Web store server, and a customer relationship management (CRM) server. In the first scenario, Figure 1, we see that none of these is ever run in a steady state. If these are the actual profiles of the consolidated server, it would be prudent to examine peak resource requirements when superimposed at one instant of time to determine how well the overall system is performing.



Figure 1: System resource profile for workloads that are not operating in a steady state
click image for larger view
 

In the second example, Figure 2, we see three server utilization profiles that all reach a steady state. If we were to examine the performance of the consolidated system and did our study at some point after 15 hours of running, we would see a much simpler profile of the workloads in a steady state. While the second workload is easier to test and tune, it may not reflect the actual end-user resource profile.



Figure 2: System profile for workloads that are operating in a steady state
click image for larger view
 

Virtualization Characterization Challenges

Whatever performance tools, methods, and processes are used for the characterization, tuning and simulation of server-based workloads they are likely to continue to be relevant in a virtualized environment. As much as we would like to have a single benchmark (or a small set of benchmarks) to describe server performance, there is nothing as good as the actual end-user workload (what they do today and how that will change over time) to employ in developing a performance and projection discipline. This will also be true for virtualization performance, since no single workload will characterize all user requirements. Consider some different user requirements which may include the following:

  • A threshold minimum throughput must be maintained over time.
  • Some margin must be available for peak workload requirements or for future expansion.
  • The server provides some service and the response time to any specific request or set of requests cannot exceed some specified quality-of-service threshold.

We can better understand some of the new challenges that are introduced in the context of virtualization when we consider the requirements for and how a system will be used. For our example, server environments are associated with the consolidation of existing (often legacy) systems and the virtual partitioning of an existing platform for new server deployments. The diversity of what is being consolidated requires that no one workload or environment can be used as a general proxy for (most) others. Consider the diversity of essential components (and how poorly one workload would serve as a proxy for another):

  • One OS as a proxy for all others (e.g., Windows* to represent Linux*)
  • One usage model or vertical to represent another (Linpack to represent MMB3)

Listed below are some of the other new challenges virtualization adds:

  • There are many different options on how a platform will be partitioned and how resources can be allocated, each dramatically affecting the performance of each and how they interact with each other. These are further compounded depending upon what the goals are: for example, absolute performance, minimum performance thresholds, power consumption, TCO, or other optimization criteria.
  • There are different strategies that can be used to evaluate a system, including response time, throughput, percentage utilization, and others. These may be exploited simultaneously across separate workloads running across different VMs in a single performance discipline.

Implementation Challenges

As different software stacks are combined inside a set of VMs, there are considerations that may affect precision and repeatability of the results. Often these are tied to the specific implementation of the virtualization abstraction layer and underlying platform. Though by no means an exhaustive list, such issues could include the following:

  • VM clock accuracy/precision: Since there are several VMs running on a single platform, there is a variety of approaches to how the virtual clock is mapped to the physical platforms' clock, and any of these can cause clock skew. Since most benchmarks will compute a performance metric based upon the always assumed correct system clock, any changes in the clock behavior could lead to errors in computing the delivered performance. Such issues, as well as ways to minimize this possibility, are further explored in VMware [5].
  • While an extensive set of system performance monitors are available under most native operating systems (OSs), most virtualization monitors provide only the most basic performance monitoring capabilities. This is sure to improve over time, but the combination of the environment getting more complicated from both consolidation and virtualization and the nascent state of performance monitoring conspire to increase the difficulty to comprehend and productively tune the system.
  • All virtualization implementations introduce an additional level of abstraction and not unexpectedly, additional overhead. This makes appropriate system configuration even more important than it is for unvirtualized environments, since resource limitations usually drive up the context switching rates, perhaps at multiple levels of abstraction. Being more generous with memory and I/O capacity when setting up the system initial configuration in a virtualized environment can offer an even larger return in performance and price/performance than non-virtualized environments. As a simple example, a reduction in page fault activity after adding some RAM in a virtualized environment is likely to pay an even larger dividend than in the pre- virtualized environment.
  • Many unvirtualized server benchmarks will have a range of observed performance. When multiple workloads are consolidated on a platform and hosted in VMs, this likely adds more variation, particularly if any of the constituent workloads can impact each other or are tested before they are running in a steady state. Readers are encouraged to run their experiments as many times as is necessary to understand the performance profile and variation from run to run.
  • Obtaining consistent and predictable performance results assumes that scheduling across VMs is equitable and consistent. It is possible in a virtualization benchmark that the scheduler is not providing what appears to be an equitable distribution of compute and I/O resources across the VMs. For example, if you had N identical copies of a particular workload with the same virtualization monitor configurations, you would expect each to get 1/Nth of the resources available on the system. It is suggested that performance analysts inspect the system during benchmarking to ensure that expected resource profiles are observed.
  • Some virtualization monitors will give you various options to map physical CPUs to virtual CPUs and to create affinity between certain sets or to allow a more general pool of resources to be shared amongst all VMs. Virtualization monitors may also permit the setting of weights or CPU percentage to each workload. The higher the workload's weight, the more it will be scheduled to use CPU resources. How to set these depends upon user requirements. For example, is it desirable to ensure that some CPUs are dedicated to certain workloads, or do you want the flexibility for the VM to allocate CPUs based upon dynamic workload changes in real-time? Is one of the workloads more important than others and therefore should a bigger weight be assigned to it?

When consolidating multiple workloads on a single physical platform, a number of physical devices need to be shared between VMs. Some platforms and virtualization monitors provide different options on how to map the physical devices to virtualized devices. Some physical devices can be assigned solely to a specific workload or just shared between a set of VMs. It depends on customer requirements to set the options. For example, customers can decide to assign a NIC to a Web-bound workload exclusively, and all other more compute bound workloads will share another NIC.


  Section 4 of 11  

In This Article
Abstract
Introduction
Enterprise Virtualization Usage Models
Virtualization Performance Characterization Challenges
Virtualization Performance Discipline
vConsolidate Example
Industry-Standard Virtualization Benchmarks
Conclusion
Acknowledgments
References
Authors' Biographies
Download a PDF of this article.    Email This Page
Back to Top