|
Here is a brief history leading up to the discussion and decision to implement server virtualization for a manufacturing
support group at Intel. This organization's server population grew 65% over the last three years with 2006 projections
meeting or exceeding this trend. As this organization grew and acquired servers, many of these acquisitions were
waterfalled servers being released by other Intel business units. The initial costs made this type of acquisition
financially attractive but as we move forward four years, most of these servers have reached their end of life. Another
factor is that the primary datacenter for this group is projected to reach complete build out in 12-18 months with no plans
to expand. The challenge for the organization was to continue supporting the server growth and replace aging
hardware with limited datacenter space while maintaining the same high level of customer support.
This group partnered with a forward-thinking IT group to evaluate, plan, and implement a virtual server environment. In
this case study, we walk you through the steps, lessons, hurdles, and successes of this effort. The covered topics
include software evaluation, candidate evaluation, hardware design, host hardware setup, virtual server setup, server
testing procedures, and initial results.
There are multiple factors to consider when evaluating and selecting server virtualization software. Our team carefully
reviewed leading technology products and evaluated different system design options. The two most popular virtualization
architectures were host-based virtualization (Microsoft Virtual Server 2005*; VMware GSX 3.1*; Microsoft Virtual PC
2004*; VMware Workstation 5.0*) and full virtualization (VMware ESX 2.5*).
Host-based virtualization requires the installation of a base OS first and then a VMM to be responsible for the
execution of all VMs. In addition to the VMM application, the OS can execute other applications (e.g., Anti-Virus,
Backup). The downsides to this type of architecture are a heavy performance penalty, high system resources utilization
by host system management, additional work to support host maintenance and management, and the upkeep of host security.
The full virtualization design starts off with the installation of a mini kernel (a hypervisor optimized for
virtualization) on the physical server. This kernel uses minimal system resources since it focuses only on tasks
required for virtualization, and it does not run unnecessary processes or applications. The hypervisor provides full
hardware virtualization and distributes the necessary system resources to all VMs. Each VM contains its own OS and
cannot distinguish it is running on virtualized hardware. This architecture is ideal for consolidating high-end
datacenter solutions.
The decision process to determine the proper virtualization architecture is a critical and time-consuming task. Our team
researched benchmarking results of multiple virtualization products and analyzed the cost and supportability options. We
prioritized our list of requirements and rated the various software options. We evaluated four products against our
requirements and scored their performance. Our requirements included performance, manageability, supportability,
stability, security, and a wide range of capabilities. Table 1 is an example of how we did our comparison: (utilizing
fictitious data).

Table 1: Product evaluation scorecard example
click image for larger view
After evaluating the scores, we selected a full virtualization software solution for our virtual server environment.
When virtualizing a datacenter, the project's success is directly dependent on choosing the appropriate candidates. We
approached this step by defining our virtualization strategy for this business unit. First, we divided their server
environment into four categories:
-
Ideal candidates
-
Candidates
-
Potential candidates
-
Not a candidate
To categorize each server, we started by collecting data on performance, system utilization, end-of-service timelines,
business area, and application specifics. Once the selection criteria data were collected, we mapped our servers against
the selection criteria to determine in which virtualization category a server belonged. Once categorized, our team
focused on 75 candidates and worked with the business unit to evaluate application specifics and machine load analysis.
With our performance evaluations and customer input, we assembled the server requirements:
-
CPU consumption
-
Required memory
-
Disk I/O intensity
-
Network requirements
-
OS configuration
We used these data when evaluating different hardware platforms for our virtualization environment.
To maximize Return on Investment (ROI), number of virtual systems, and performance, this team's final choice for the
virtual host servers was the 4-way Intel® Xeon® processor 7040Φ 3.0 GHz-2 MB L2 cache system with 16 G of RAM,
2 x 2 Gb, 64-bit/133 MHz PCI-X-to-Fiber Channel Host Bus Adapters and three Network Dual-Port PCI-X 1000T Gigabit Server
Adapters.
The hardware selected for this virtual environment is based on an Intel® IT standardized platform. The team focused on
designing a robust virtual infrastructure without introducing single points of failure. This design would address our
customer's primary concern with consolidation of multiple applications to a single physical machine.
The team agreed on an environment that would be immune to hardware failure and power interruptions while possessing the
ability to load-balance. The consolidated applications would reside on host servers containing dual power supplies,
mirrored hard drives, and teamed network interface cards. The centralized storage solution selected is a multi-terabyte
storage area network (SAN) with full fault-tolerant capabilities. Connections to the host servers were made possible through two 2 Gb fiber
channel switches configured with redundant paths. This design enables load-balancing, as all VM files reside in a
central location and access is possible by each host. Figure 6 shows the details of this design.

Figure 6: Layout of virtual environment detailing the built-in redundancy
Utilizing an available software feature, VMs can be migrated to another physical host. This migration is done in an
active state and causes no server downtime while applications continue to operate uninterrupted. End users are unaware
of such migrations. We use this tool to aid in managing downtimes, load-balancing, and other resource alignment needs.
After reviewing multiple virtualization case studies, the team agreed on a 20:1 consolidation ratio limit of VMs to a
single physical system. Our initial design consists of 4 physical machines with 15 virtual guests configured on each.
This will incorporate 60 ideal candidates targeted for consolidation while reserving resources for potential migrations.
In case of physical server failure, the VMs on the failed host would migrate to the 3 remaining hosts as seen in Figure
7. This will permit 5 additional VMs to migrate to each host, respectively, maintaining the 20:1 consolidation and 100%
availability.

Figure 7: Demonstration of failover when a host system fails
click image for larger view
It was easy to justify this project because we were up against several looming obstacles. First, the hardware in use was
aged and being purged from our current supportability model. Replacement of this hardware on a one-for-one basis was
very costly. Second, the datacenter is constrained by space and power. We needed a solution that would free up physical
space in the computing environment. By replacing out-dated servers with virtual servers, we not only saved ~40% on
hardware upgrade costs but more importantly extended the capacity of our datacenter. This basic ROI did not investigate
costs associated with power, network, AC, etc. Figure 8 shows our first-year ROI.

Figure 8: Our first-year ROI calculation (software costs are approximations)
click image for larger view
As the approval, purchasing, and installation of the actual virtualization island was in process, the team utilized a
validation environment to begin building server configurations and testing potential candidate servers. To do this, we
established an overall test, validation, and implementation plan for our "Ideal Candidate" servers. We notified the
owners of these machines of the timeline for testing and identified our criteria for a successful test.
The technical team defined and created a "gold build" server definition (based on the data collected during server
classification).
As the testing timeline progressed, server owners were notified three weeks prior to their servers being created. This
notification included a detailed timeline for the next five weeks and the requirements for completing a virtualization
test. Two weeks before testing began, the server owners met with the virtualization team to discuss special requests,
variations from the gold build configuration, and to approve VM resource allocation. After this meeting, the technical
team provisioned the new servers and kept them in a "power off" status. The server owners then had to prepare their test
plan, success criteria, and migration strategy during these two weeks. The test plan had to include a regression test
for any application installed on the server to ensure it executed properly, along with the server functions. Two days
prior to the start of virtual server testing, test plans, success criteria, and migration plans were reviewed and
approved. Once all requirements were met, the servers were released to the testing team to build their applications,
copy data, and configure the server with all required software and application information. The test team did all OS and
application testing in a two-week period and met with the virtualization team at the end of the two weeks. When all
success criteria had been met, the server was shut down. Once the final hardware landed in the datacenter, the server
configurations were moved to the production hardware, restarted, and each test group did a quick validation to ensure
the server was in the state it was shut down in.
That was when the virtualization team turned over the "keys" to the server owner and the owners executed their
migration plan and moved the physical machine contents to the VM. When the final migration was complete, the physical
machines were powered off and removed from the datacenter within 30 days.
|