1. Introduction
The Virtual Desktop Infrastructure (VDI) workload has become more significant with the growth of remote working and distributed workforce.
The centralization of resources through VDI, provides an opportunity to reduce the total cost of ownership (TCO) for data centers that host VDI sessions. Balancing the uneven distribution of system resources used during VDI sessions can improve the TCO. The issue is that some nodes and cores being allocated to a VDI session need more system resources than are available. Other nodes serving virtual machines (VMs) need fewer system resources and they have a surplus.
To a large extent, Intel® Speed Select Technology solves this problem by allocating system resources to VMs that need them. This is done without compromising the performance of the applications running in the VM.
This guide describes how VDI as a workload can take advantage of the Intel Speed Select Technology, including being optimized for VM capacity without an additional bill of material, to improve the total cost of ownership.
1.1 Terminology
ABBREVIATION | DESCRIPTION |
CCF | Calculated Compute Frequency |
Intel® SST-BF | Intel® Speed Select Technology - Base Frequency |
Intel® SST-CP | Intel® Speed Select Technology - Core Power |
Intel® SST-PP | Intel® Speed Select Technology – Performance Profile |
Intel® SST-TF | Intel® Speed Select Technology - Turbo Frequency |
SLA | Service level agreement |
TCO | Total cost of ownership |
VDI | Virtual Desktop Infrastructure |
VM | Virtual Machine |
Table 1. Terminology
1.2 Reference Documentation
Table 2. Reference Documentation
2. Overview
The number of people working remotely has increased due to change in the way we work, for example, the hybrid working arrangement, greater flexibility in the workforce, or work from anywhere. The VDI workload enables greater flexibility and freedom so that workers can be productive working from almost anywhere. VDI is a form of desktop virtualization. Virtualization runs inside a virtual machine (VM) that is hosted on a central server. Users are given access to the VM through VDI sessions. For example, a Task worker is someone who uses a single application and needs a small VM. A Knowledge worker is someone who uses word processors, presentation software, spreadsheets, and other similar tools. They need a larger VM. A Power user is mostly a content creator who may be making videos or other types of content that require more computing resources, memory, and other system resources and would benefit from an even larger VM. The size of the VM changes depending on the application being run inside the VM. Different types of users will have different VM parameter settings based on their user profile. Currently, a typical enterprise IT environment may have a server cluster that serves different types of users. However, since all of the servers in the server cluster have been configured with the same processing capability, memory, and storage configurations, every VM ends up being treated equally. Therefore, some VMs may have a surplus of resources while others are lacking.
3. Why DVI is Different from Other Virtual Workloads?
The key difference between VDI and other virtualized workloads is the number of VMs that are spawned out of one node. A typical node in VDI deals with hundreds of virtual machines together. System resources are shared among all of these virtual machines and the applications running inside them. VDI sessions also have specific service level agreement (SLA) commitments that must be met. These may be compromised by a subset of VDI sessions that we call “rogue” sessions. These rogue sessions consume too many resources and starve other VDI sessions until they cannot meet the SLA commitments. This happens when rogue sessions perform computationally intensive tasks that consume more of the CPU energy so that performance suffers in other VDI sessions running on other cores.
System resources such as computing power, memory, storage, and networking must be carefully prioritized and distributed. The allocation of system resources is very important to help ensure that the user’s experience using their virtual desktop is the same as using their laptop, desktop, or other fully-powered, end-user device.
Most of the hypervisor vendors, size the VMs with number of CPU cycles available. This calculation is based on the base frequency of that platform. A typical Compute Selection during capacity planning derives Calculated Compute Frequency (CCF) of a given platform. CCF is the total CPU cycles of the system, calculated as follows:
CCF = Base Frequency of the CPU x Core Count per Socket x No. of Sockets x Hyper-Threading-Factor
For example, when using the 2nd Generation Intel® Xeon® Scalable Processor 6230, CCF will be calculated as:
CCF = 2100 (base frequency) x 20 (number of cores per socket) x 2 (number of sockets) MHz x 1.4 (HT factor) = 117600
On the same node, if the VMs are sized for different user profiles, then currently, the number of cores is the only sizing factor for getting a different CPU cycle, which results in uneven utilization. VMs hosting applications for some user profiles have too many allocated cycles and others do not have enough. System resources could be better utilized if VMs running Worker applications could share their extra cycles with VMs running Power user applications that need the extra power.
4. Intel® Speed Select Technology
The Intel Speed Select Technology is a collection of power and performance features that gives more granular control over CPU performance. It can be tuned for advanced prioritization of cores and to help obtain more performance out of the same server. The Intel Speed Select Technology features supported by the 3rd Generation Intel® Xeon® Scalable processor aims to allow more control over processor performance in order to support optimized the TCO. Intel® Speed Select Technology - Base Frequency (Intel® SST-BF) and Intel® Speed Select Technology - Core Power (Intel® SST-CP) technology can maintain a higher base frequency on a subset of processor cores and a lower base frequency on the remaining processor cores. Intel® Speed Select Technology - Turbo Frequency (Intel® SST-TF) can maintain a higher base frequency on a subset of processor cores and a lower base frequency on the rest of the processor cores. Intel® SST can be used by VDI users to help boost performance for certain cores either by disabling other cores or by lowering their frequency. Various features within Intel® Speed Select Technology (Intel® SST) can be used to optimize VDI workloads. Two usage models are described in this guide: Intel® Speed Select Technology – Performance Profile (Intel SST-PP) and Intel SST-BF.
4.1 System Level Uniform VDI Instances
Intel SST-PP allows multiple optimized performance profiles per SKU via a static boot-time configuration or a dynamic runtime configuration. The figure below shows a CPU with 8 cores having a lower performance characteristic as a base Intel SST-PP configuration. When switched to a different configuration with only 6 cores available, the per core performance improves resulting in the VMs running on those cores getting a performance boost. In this usage model, the entire CPU and server are reconfigured to use a specific Intel SST-PP profile. Having only one type of VDI VM instance deployed takes full advantage of the optimized CPU performance1.
Figure 1. VM to Core Mapping with Intel® SST PP
1 See backup for workloads and configurations. Results may vary.
4.2 Optimized Multi VM Type Deployment
The Intel SST-BF feature permits a two-tier asymmetric system of core frequency deployment within the same multi-core processor. While operating in Intel SST-BF enabled mode, the base frequency on specific sets of CPUs, such as high priority CPUs, are increased at the expense of lower base frequency or low priority CPUs. Using this feature, multiple VDI VM instance types can be deployed on the same server. This optimizes the TCO of a VDI cluster. The ability to switch between the Intel SST-BF enabled mode and the Intel SST-BF disabled mode (and hence provide uniform performance for all cores) allows the optimal usage of the server resource to align to changing business demands.
Figure 2 shows VM0 realizing high performance by being scheduled with the Intel SST-BF enabled high priority cores. By being configured as a normal performance VM type, VM1 is getting just enough performance without wasting precious computing resources.
Figure 2. VM to Core Mapping with Intel® SST BF
5. Proof of Concept
This proof of concept was designed to reveal how Intel Speed Select technology can improve TCO by solving the problem of uneven system resource distribution. Two different user profiles were tested, the knowledge worker profile for office workers and the power user profile for content creators. Test results show that system resources were allocated to VMs as needed. The following system configuration was used.
Host Configuration |
|
---|---|
SKU | 3rd Generation Intel Xeon Scalable processor QWN3 32 Core, 2-Socket Processor, Base Frequency 2.0 GHz |
PMEM |
BPS (8+8), 128 GB DIMMs |
Socket count | 1 |
Memory Channels | 8 |
Turbo/HT | ON |
DDR4 | 16 x 16 GB = 256 GB |
L1D Cache | 48 KB |
Mid-Level Cache | 1.25 MB |
DDR Speed | 3200 MT/s |
NVMe |
4x5 + 1X1 = 21 TB |
TDP | 205 W |
BKC Version | WW06 Whitley BKC |
BIOS Version | WLYDCRB1.SYS.0020.P57.2101290525 |
ESXi Version | 7 U1 with P02 |
Benchmark | View Planner 4.5 |
VM Configuration | 4 vCPU, 8 GB RAM, 30 GB disk |
Guest OS | Windows 10 Pro |
This diagram shows the VDI Resource Server and VDI Infrastructure Server Test Setup.
Figure 3. VDI Resource Server and VDI Infrastructure Server Test Setup
The following steps were followed to configure the VDI sessions with Intel SST-BF:
1. Enable Intel SST-BF on the BIOS advanced configuration.
EKDIT menu > Socket configuration > Advanced Power Management configuration > CPU P state control
2. Boot to OS
3. Install the MSR tool if it is not installed by default.
4. Read the MSR 774 to view the frequency of different cores. This script can be used in the ESXi environment.
#!/bin/sh i=0 while [ $i -lt 160 ] do vsish -e get /hardware/msr/pcpu/$i/addr/0x774 i=`expr $i + 1` The sample output of this command is a list of cores with a frequency 0x8000ff01 0x8000ff01 0x8000ff01 0x8000ff16 0x8000ff16
The following distribution of CPU frequency was reported:
- 24 Cores as High Priority cores as indicated by the MSR bits 7:0 showing a value higher than 0x01
- In this specific CPU, the value was 0x16
- 40 Cores as Normal Priority Core as indicated by the MSR bits 7:0 showing a value of 0x01
5.1 Test Plan
The View Planner was used for measuring the density of a VM and latency of applications running in the VMs. Two View Planner test plans were run simultaneously. The first test plan was run on VMs with “office app” pinned to normal priority cores depicting a typical Knowledge worker or Office worker profile. The second was run on VMs with “Media apps” pinned to high priority cores depicting a typical Power user or Content creator profile. The number of VMs on the first test were kept at a constant number of 100, while the number of VMs on the second test varied from the initial number of 400 when Intel SST-BF was disabled to 500 when the Intel SST-BF was enabled. The latency numbers of each operation of the applications running inside these VMs were studied along with the system parameters.
VM Type |
VM Density |
Core Type |
ESXi Core numbers |
---|---|---|---|
VMs running office applications emulating worker profiles that need fewer system resources |
Fixed at 100 VMs | Normal Priority Core | ESXi Core numbers: 0-7,12-17,20-33,36-39,52-53,56-57,60-71,76-81,84-97,100-103,116-117,120-121,124-127 |
VMs running media applications emulating power user profiles that need more system resources | Varied by increasing from 400 to 500 VMs with SST-BF enabled | High Priority Cores | ESXi Core numbers: 8-11,18-19,34-35,40-51,54-55,58-59,72-75,82-83,98-99,104-115,118-119,122-123 |
5.2 Test Results
When the Intel SST-BF was not enabled, the second test gave a “Resource unavailable” error when increasing the VM density above 400. When Intel SST-BF was enabled, this error disappeared, and the density could be increased to 500 without any resource shortage.
5.2.1 Observation from the Experiment
The following observations were made by running the proof of concept experiments:
- Systems enabled with Intel SST-BF and prioritizing VMs appropriately could accommodate 100 more high priority VMs with NO or minimal impact to application performance
- Application latency remains the same with the increase in VM density on high priority cores
- Application Execution Ratio remains the same with the increase in VM density on high priority cores
- CPU usage and application performance for Intel SST-BF Disable mode show that resources are wasted
The following table contains actual test results.
Parameters |
Intel SST-BF=Enable |
Intel SST-BF=Disable |
Intel SST-BF=Disable |
|||
Media |
Office |
Media |
Office |
Media |
Office |
|
Number of VMs Tested |
500 |
100 limited by memory and storage |
400 >400 VM runs failed with a “Short of System Resources Error” |
100 limited by memory and storage |
400 >400 VM runs failed with a “Short of System Resources Error” |
100 |
CPU Utilization in % |
63 |
41 |
36 |
31 |
||
Max Core Freq in GHz |
2.2 |
1.2 |
2 |
2 |
2.4 |
2.4 |
Latency in ms Group A Threshold < 1 |
0.2691 |
0.8601 |
0.2689 |
0.8282 |
0.2691 |
0.7961 |
Latency in ms Group B (Storage Sensitive) Threshold < 6 |
2.1297 |
5.6230 |
2.1091 |
4.3273 |
2.1074 |
3.8388 |
Ratio of Actual to Expected Operations (0.0 to 1.0) |
1.0000 |
0.9900 |
1 |
1.0000 |
1.0000 |
1.0000 |
6. Summary
Enterprises need to maximize and optimize VDI deployments to lower the total cost of ownership and running costs. VDI deployments can take advantage of Intel Speed Select Technology and boost VM density by ~25% for VMs running on high priority cores, without compromising the performance as measured by workload latency. Customers deploying different VDI configurations from the same node or cluster can use Intel SST to get additional performance upside without any extra investment.
VDI as a workload can benefit from Intel Speed Select Technology. Hypervisor vendors who add native support of Intel SST to their OS and orchestrator will enable easier configuration for end users. OEMs can enable or disable the feature depending on the kind of VDI sessions they want to deploy from the same node.
Appendix A 3rd Generation Intel® Xeon® Scalable Processors that support Intel® Speed Select Technology
Processor Model |
Intel® SST-PP Base Configuration 0 |
Intel® SST-PP Configuration 3 |
Intel® SST-PP Configuration 4 |
Intel® SST-CP |
Intel® SST-BF High Priority |
Intel® SST-BF Low Priority |
Intel® SST-TF |
---|---|---|---|---|---|---|---|
8380 | 40 Cores / 270W / 2.3 GHz | N/A | N/A | Yes | 16 Cores / 2.4 GHz | 24 Cores / 2.2 GHz | Yes |
8368Q | 38 Cores / 270W / 2.6 GHz | N/A | N/A | Yes | N/A | N/A | Yes |
8368 | 38 Cores / 270W / 2.4 GHz | N/A | N/A | Yes | 16 Cores / 2.5 GHz | 22 Cores / 2.3 GHz | Yes |
8362 | 32 Cores / 265W / 2.8 GHz | N/A | N/A | Yes | N/A | N/A | Yes |
8360Y | 36 Cores / 250W / 2.4 GHz | 32 Cores / 250W / 2.5 GHz | 24 Cores / 220 W / 2.6 GHz | Yes | 12 Cores / 2.7 GHz | 24 Cores / 2.1 GHz | Yes |
8358P | 32 Cores / 240 W / 2.6 GHz | N/A | N/A | Yes | N/A | N/A | Yes |
8358 | 32 Cores / 250 W / 2.6 GHz | N/A | N/A | Yes | 12 Cores / 2.8 GHz | 20 Cores / 2.4 GHz | Yes |
8352Y | 32 Cores / 205 W / 2.2 GHz | 24 Cores / 185 W / 2.3 GHz | 16 Cores / 185 W / 2.6 GHz | Yes | 12 Cores / 2.4 GHz | 20 Cores / 2 GHz | Yes |
8352V | 36 Cores / 195 W / 2.1 GHz | 32 Cores / 180 W / 2 GHz | 24 Cores / 155 W / 2 GHz | Yes | N/A | N/A | Yes |
8352S | 32 Cores / 205 W / 2.2 GHz | 24 Cores / 185 W / 2.3 GHz | 16 Cores / 185 W / 2.6 GHz | Yes | 12 Cores / 2.4 GHz | 20 Cores / 2 GHz | Yes |
8352M | 32 Cores / 185 W / 2.3 GHz | 28 Cores / 185 W / 2.4 GHz | 24 Cores / 185 W / 2.6 GHz | Yes | N/A | N/A | Yes |
8351N | 36 Cores / 225 W / 2.4GHz | N/A | N/A | Yes | 18 Cores / 2.6 GHz | 18 Cores / 2.2 GHz | Yes |
6354 | 18 Cores / 205 W / 3 GHz | N/A | N/A | Yes | 8 Cores / 3.1 GHz | 10 Cores / 2.8 GHz | Yes |
6348 | 28 Cores / 235 W / 2.6 GHz | N/A | N/A | Yes | 12 Cores / 2.8 GHz | 16 Cores / 2.5 GHz | Yes |
6346 | 16 Cores / 205 W / 3.1 GHz | N/A | N/A | Yes | 4 Cores / 3.2 GHz | 12 Cores / 3 GHz | Yes |
6342 | 24 Cores / 230 W / 2.8 GHz | N/A | N/A | Yes | 12 Cores / 2.9 GHz | 12 Cores / 2.6 GHz | Yes |
6338T | 24 Cores / 165 W / 2.1 GHz | N/A | N/A | Yes | 8 Cores / 2.3 GHz | 16 Cores / 1.9 GHz | Yes |
6338N | 32 Cores / 185 W / 2.2 GHz | N/A | N/A | Yes | 18 Cores / 2.4 GHz | 14 Cores / 1.9 GHz | Yes |
6338 | 32 Cores / 205 W / 2 GHz | N/A | N/A | Yes | 12 Cores / 2.2 GHz | 20 Cores / 1.8 GHz | Yes |
6336Y | 24 Cores / 185 W / 2.4 GHz | 12 Cores / 150 W / 2.9 GHz | 8 Cores / 140 W / 3.1 GHz | Yes | 8 Cores / 2.5 GHz | 16 Cores / 2.2 GHz | Yes |
6334 | 8 Cores / 165 W / 3.6 GHz | N/A | N/A | Yes | 4 Cores / 3.7 GHz | 4 Cores / 3.4 GHz | Yes |
6330N | 28 Cores / 165 W / 2.2 GHz | N/A | N/A | Yes | 18 Cores / 2.3 GHz | 10 Cores / 1.9 GHz | Yes |
6330 | 28 Cores / 205 W / 2 GHz | N/A | N/A | Yes | 12 Cores / 2.1 GHz | 16 Cores / 1.8 GHz | Yes |
6326 | 16 Cores / 185 W / 2.9 GHz | N/A | N/A | Yes | 4 Cores / 3 GHz | 12 Cores / 2.6 GHz | Yes |
6314U | 32 Cores / 205 W / 2.3 GHz | N/A | N/A | Yes | 12 Cores / 2.5 GHz | 20 Cores / 2.1 GHz | Yes |
6312U | 24 Cores / 185 W / 2.4 GHz | N/A | N/A | Yes | 8 Cores / 2.6 GHz | 16 Cores / 2.3 GHz | Yes |
5320T | 20 Cores / 150 W / 2.3 GHz | N/A | N/A | Yes | 6 Cores / 2.6 GHz | 14 Cores / 2.1 GHz | Yes |
5320 | 26 Cores / 185 W / 2.2 GHz | N/A | N/A | Yes | 12 Cores / 2.5 GHz | 14 Cores / 2 GHz | Yes |
5318Y | 24 Cores / 165 W / 2.1 GHz | 24 Cores / 150 W / 1.9 GHz | 22 Cores / 150 W / 2 GHz | Yes | 8 Cores / 2.3 GHz | 16 Cores / 1.8 GHz | Yes |
5318S | 24 Cores / 165 W / 2.1 GHz | 24 Cores / 150 W / 1.9 GHz | 22 Cores / 150 W / 2 GHz | Yes | 8 Cores / 2.3 GHz | 16 Cores / 1.8 GHz | Yes |
5318N | 24 Cores / 150 W / 2.1 GHz | 20 Cores / 135 W / 2GHz | N/A | Yes | 8 Cores / 2.3 GHz | 16 Cores / 2 GHz | Yes |
5317 | 12 Cores / 150 W / 3 GHz | N/A | N/A | Yes | 4 Cores / 3.2 GHz | 8 Cores / 2.8 GHz | Yes |
5315Y | 8 Cores / 140 W / 3.2 GHz | 6 Cores / 125 W / 3.2 GHz | 4 Cores / 115 W / 3.4 GHz | Yes | 2 Cores / 3.3 GHz | 6 Cores / 2.6 GHz | Yes |
4316 | 20 Cores / 150 W / 2.3 GHz | N/A | N/A | N/A | N/A | N/A | N/A |
4314 | 16 Cores / 135 W / 2.4 GHz | N/A | N/A | N/A | N/A | N/A | N/A |
4310T | 10 Cores / 105 W / 2.3 GHz | N/A | N/A | N/A | N/A | N/A | N/A |
4310 | 12 Cores / 120 W / 2.1 GHz | N/A | N/A | N/A | N/A | N/A | N/A |
4309Y | 8 Cores / 105 W / 2.8 GHz | 8 Cores / 95 W / 2.6 GHz | 8 Cores / 85 W / 2.3 GHz | N/A | N/A | N/A | N/A |