Intel® Xeon® Scalable Processor - Virtual Desktop Infrastructure with Intel® Speed Select Technology

ID 734376
Updated 6/13/2022
Version Latest
Public

author-image

By

1. Introduction

The Virtual Desktop Infrastructure (VDI) workload has become more significant with the growth of remote working and distributed workforce.  

The centralization of resources through VDI, provides an opportunity to reduce the total cost of ownership (TCO) for data centers that host VDI sessions. Balancing the uneven distribution of system resources used during VDI sessions can improve the TCO. The issue is that some nodes and cores being allocated to a VDI session need more system resources than are available. Other nodes serving virtual machines (VMs) need fewer system resources and they have a surplus.

To a large extent, Intel® Speed Select Technology solves this problem by allocating system resources to VMs that need them. This is done without compromising the performance of the applications running in the VM.

This guide describes how VDI as a workload can take advantage of the Intel Speed Select Technology, including being optimized for VM capacity without an additional bill of material, to improve the total cost of ownership.

1.1 Terminology

ABBREVIATION DESCRIPTION
CCF Calculated Compute Frequency
Intel® SST-BF Intel® Speed Select Technology - Base Frequency
Intel® SST-CP Intel® Speed Select Technology - Core Power
Intel® SST-PP Intel® Speed Select Technology – Performance Profile
Intel® SST-TF Intel® Speed Select Technology - Turbo Frequency
SLA Service level agreement
TCO Total cost of ownership
VDI Virtual Desktop Infrastructure
VM Virtual Machine

 

Table 1. Terminology

1.2 Reference Documentation

REFERENCE SOURCE
VMware View Planner 4.5 https://customerconnect.vmware.com/downloads/info/slug/other/vmware_view_planner/4_5

Intel® Speed Select Technology (Intel® SST) - Performance Enhancements for 3rd Gen Intel® Xeon® Scalable Processor Technology Guide

https://networkbuilders.intel.com/solutionslibrary/intel-speed-select-technology-intel-sst-performance-enhancements-for-3rd-gen-intel-xeon-scalable-processor-technology-guide

Intel® Speed Select Technology – Base Frequency (Intel® SST-BF) with Kubernetes* Application Note

https://networkbuilders.intel.com/solutionslibrary/intel-speed-select-technology-base-frequency-with-kubernetes-application-note

Intel® Speed Select Technology – Base Frequency Configuration Automation on OpenStack* Compute Host

https://networkbuilders.intel.com/solutionslibrary/intel-speed-select-technology-base-frequency-configuration-automation-on-openstack-compute-host

Intel® Speed Select Technology – Performance Profile (Intel® SST-PP) Overview User Guide

https://networkbuilders.intel.com/solutionslibrary/intel-speed-select-technology-performance-profile-intel-sst-pp-overview-user-guide 

 

Table 2. Reference Documentation

2. Overview

The number of people working remotely has increased due to change in the way we work, for example, the hybrid working arrangement, greater flexibility in the workforce, or work from anywhere. The VDI workload enables greater flexibility and freedom so that workers can be productive working from almost anywhere. VDI is a form of desktop virtualization. Virtualization runs inside a virtual machine (VM) that is hosted on a central server. Users are given access to the VM through VDI sessions. For example, a Task worker is someone who uses a single application and needs a small VM.  A Knowledge worker is someone who uses word processors, presentation software, spreadsheets, and other similar tools. They need a larger VM. A Power user is mostly a content creator who may be making videos or other types of content that require more computing resources, memory, and other system resources and would benefit from an even larger VM. The size of the VM changes depending on the application being run inside the VM. Different types of users will have different VM parameter settings based on their user profile. Currently, a typical enterprise IT environment may have a server cluster that serves different types of users. However, since all of the servers in the server cluster have been configured with the same processing capability, memory, and storage configurations, every VM ends up being treated equally. Therefore, some VMs may have a surplus of resources while others are lacking.  

3. Why DVI is Different from Other Virtual Workloads?

The key difference between VDI and other virtualized workloads is the number of VMs that are spawned out of one node. A typical node in VDI deals with hundreds of virtual machines together. System resources are shared among all of these virtual machines and the applications running inside them. VDI sessions also have specific service level agreement (SLA) commitments that must be met.  These may be compromised by a subset of VDI sessions that we call “rogue” sessions. These rogue sessions consume too many resources and starve other VDI sessions until they cannot meet the SLA commitments. This happens when rogue sessions perform computationally intensive tasks that consume more of the CPU energy so that performance suffers in other VDI sessions running on other cores.

System resources such as computing power, memory, storage, and networking must be carefully prioritized and distributed. The allocation of system resources is very important to help ensure that the user’s experience using their virtual desktop is the same as using their laptop, desktop, or other fully-powered, end-user device.

Most of the hypervisor vendors, size the VMs with number of CPU cycles available. This calculation is based on the base frequency of that platform.  A typical Compute Selection during capacity planning derives Calculated Compute Frequency (CCF) of a given platform. CCF is the total CPU cycles of the system, calculated as follows:

CCF = Base Frequency of the CPU x Core Count per Socket x No. of Sockets x Hyper-Threading-Factor

For example, when using the 2nd Generation Intel® Xeon® Scalable Processor 6230, CCF will be calculated as:

CCF = 2100 (base frequency) x 20 (number of cores per socket) x 2 (number of sockets) MHz x 1.4 (HT factor) = 117600

On the same node, if the VMs are sized for different user profiles, then currently, the number of cores is the only sizing factor for getting a different CPU cycle, which results in uneven utilization. VMs hosting applications for some user profiles have too many allocated cycles and others do not have enough. System resources could be better utilized if VMs running Worker applications could share their extra cycles with VMs running Power user applications that need the extra power.

4. Intel® Speed Select Technology

The Intel Speed Select Technology is a collection of power and performance features that gives more granular control over CPU performance. It can be tuned for advanced prioritization of cores and to help obtain more performance out of the same server. The Intel Speed Select Technology features supported by the 3rd Generation Intel® Xeon® Scalable processor aims to allow more control over processor performance in order to support optimized the TCO. Intel® Speed Select Technology - Base Frequency (Intel® SST-BF) and Intel® Speed Select Technology - Core Power (Intel® SST-CP) technology can maintain a higher base frequency on a subset of processor cores and a lower base frequency on the remaining processor cores. Intel® Speed Select Technology - Turbo Frequency (Intel® SST-TF) can maintain a higher base frequency on a subset of processor cores and a lower base frequency on the rest of the processor cores. Intel® SST can be used by VDI users to help boost performance for certain cores either by disabling other cores or by lowering their frequency. Various features within Intel® Speed Select Technology (Intel® SST) can be used to optimize VDI workloads. Two usage models are described in this guide: Intel® Speed Select Technology – Performance Profile (Intel SST-PP) and Intel SST-BF.

4.1 System Level Uniform VDI Instances

Intel SST-PP allows multiple optimized performance profiles per SKU via a static boot-time configuration or a dynamic runtime configuration. The figure below shows a CPU with 8 cores having a lower performance characteristic as a base Intel SST-PP configuration. When switched to a different configuration with only 6 cores available, the per core performance improves resulting in the VMs running on those cores getting a performance boost. In this usage model, the entire CPU and server are reconfigured to use a specific Intel SST-PP profile. Having only one type of VDI VM instance deployed takes full advantage of the optimized CPU performance1.

Figure 1. VM to Core Mapping with Intel® SST PP

1 See backup for workloads and configurations. Results may vary​.

4.2 Optimized Multi VM Type Deployment

The Intel SST-BF feature permits a two-tier asymmetric system of core frequency deployment within the same multi-core processor. While operating in Intel SST-BF enabled mode, the base frequency on specific sets of CPUs, such as high priority CPUs, are increased at the expense of lower base frequency or low priority CPUs. Using this feature, multiple VDI VM instance types can be deployed on the same server. This optimizes the TCO of a VDI cluster. The ability to switch between the Intel SST-BF enabled mode and the Intel SST-BF disabled mode (and hence provide uniform performance for all cores) allows the optimal usage of the server resource to align to changing business demands. 

Figure 2 shows VM0 realizing high performance by being scheduled with the Intel SST-BF enabled high priority cores. By being configured as a normal performance VM type, VM1 is getting just enough performance without wasting precious computing resources.

Figure 2. VM to Core Mapping with Intel® SST BF

5. Proof of Concept

This proof of concept was designed to reveal how Intel Speed Select technology can improve TCO by solving the problem of uneven system resource distribution. Two different user profiles were tested, the knowledge worker profile for office workers and the power user profile for content creators. Test results show that system resources were allocated to VMs as needed. The following system configuration was used.

 

Host Configuration

SKU 3rd Generation Intel Xeon Scalable processor QWN3 32 Core, 2-Socket Processor, Base Frequency 2.0 GHz
PMEM

BPS (8+8), 128 GB DIMMs

Socket count 1
Memory Channels 8
Turbo/HT ON
DDR4 16 x 16 GB = 256 GB
L1D Cache 48 KB
Mid-Level Cache 1.25 MB
DDR Speed 3200 MT/s
NVMe

4x5 + 1X1 = 21 TB

TDP 205 W
BKC Version WW06 Whitley BKC
BIOS Version WLYDCRB1.SYS.0020.P57.2101290525
ESXi Version 7 U1 with P02
Benchmark View Planner 4.5
VM Configuration 4 vCPU, 8 GB RAM, 30 GB disk
Guest OS Windows 10 Pro

This diagram shows the VDI Resource Server and VDI Infrastructure Server Test Setup.

Figure 3. VDI Resource Server and VDI Infrastructure Server Test Setup

The following steps were followed to configure the VDI sessions with Intel SST-BF:

1. Enable Intel SST-BF on the BIOS advanced configuration.

EKDIT menu > Socket configuration > Advanced Power Management configuration > CPU P state control

2. Boot to OS

3. Install the MSR tool if it is not installed by default.

4. Read the MSR 774 to view the frequency of different cores. This script can be used in the ESXi environment.

#!/bin/sh

i=0

while [ $i -lt 160 ]

do

        vsish -e get /hardware/msr/pcpu/$i/addr/0x774

        i=`expr $i + 1`


The sample output of this command is a list of cores with a frequency

0x8000ff01

0x8000ff01

0x8000ff01

0x8000ff16

0x8000ff16

The following distribution of CPU frequency was reported:

  • 24 Cores as High Priority cores as indicated by the MSR bits 7:0 showing a value higher than 0x01
  • In this specific CPU, the value was 0x16
  • 40 Cores as Normal Priority Core as indicated by the MSR bits 7:0 showing a value of 0x01

5.1 Test Plan

The View Planner was used for measuring the density of a VM and latency of applications running in the VMs. Two View Planner test plans were run simultaneously. The first test plan was run on VMs with “office app” pinned to normal priority cores depicting a typical Knowledge worker or Office worker profile. The second was run on VMs with “Media apps” pinned to high priority cores depicting a typical Power user or Content creator profile. The number of VMs on the first test were kept at a constant number of 100, while the number of VMs on the second test varied from the initial number of 400 when Intel SST-BF was disabled to 500 when the Intel SST-BF was enabled. The latency numbers of each operation of the applications running inside these VMs were studied along with the system parameters.

 

VM Type

VM Density

Core Type

ESXi Core numbers

VMs running office applications emulating worker profiles that
need fewer system resources
Fixed at 100 VMs Normal Priority Core ESXi Core numbers: 0-7,12-17,20-33,36-39,52-53,56-57,60-71,76-81,84-97,100-103,116-117,120-121,124-127
VMs running media applications emulating power user profiles that need more system resources Varied by increasing from 400 to 500 VMs with SST-BF enabled High Priority Cores ESXi Core numbers:  8-11,18-19,34-35,40-51,54-55,58-59,72-75,82-83,98-99,104-115,118-119,122-123

 

5.2 Test Results

When the Intel SST-BF was not enabled, the second test gave a “Resource unavailable” error when increasing the VM density above 400.  When Intel SST-BF was enabled, this error disappeared, and the density could be increased to 500 without any resource shortage.

5.2.1 Observation from the Experiment

The following observations were made by running the proof of concept experiments:

  • Systems enabled with Intel SST-BF and prioritizing VMs appropriately could accommodate 100 more high priority VMs with NO or minimal impact to application performance
  • Application latency remains the same with the increase in VM density on high priority cores
  • Application Execution Ratio remains the same with the increase in VM density on high priority cores
  • CPU usage and application performance for Intel SST-BF Disable mode show that resources are wasted

The following table contains actual test results.

Parameters

Intel SST-BF=Enable
TURBO=Disable

Intel SST-BF=Disable
TURBO=Disable

Intel SST-BF=Disable
TURBO=Enable

Media

Office

Media

Office

Media

Office

Number of VMs Tested

500

100

limited by memory and storage

400

>400 VM runs failed with a “Short of System Resources Error”

100

limited by memory and storage

400

>400 VM runs failed with a “Short of System Resources Error”

100

limited by memory and storage

CPU Utilization in %

63

41

36

31

Max Core Freq in GHz

2.2

1.2

2

2

2.4

2.4

Latency in ms Group A
(CPU Sensitive)

Threshold < 1

0.2691

0.8601

0.2689

0.8282

0.2691

0.7961

Latency in ms Group B (Storage Sensitive) Threshold < 6

2.1297

5.6230

2.1091

4.3273

2.1074

3.8388

Ratio of Actual to Expected Operations (0.0 to 1.0)

1.0000

0.9900

1

1.0000

1.0000

1.0000


6. Summary

Enterprises need to maximize and optimize VDI deployments to lower the total cost of ownership and running costs. VDI deployments can take advantage of Intel Speed Select Technology and boost VM density by ~25% for VMs running on high priority cores, without compromising the performance as measured by workload latency. Customers deploying different VDI configurations from the same node or cluster can use Intel SST to get additional performance upside without any extra investment.

VDI as a workload can benefit from Intel Speed Select Technology. Hypervisor vendors who add native support of Intel SST to their OS and orchestrator will enable easier configuration for end users. OEMs can enable or disable the feature depending on the kind of VDI sessions they want to deploy from the same node.

Appendix A 3rd Generation Intel® Xeon® Scalable Processors that support Intel® Speed Select Technology 

Processor Model

Intel® SST-PP Base Configuration 0

Intel® SST-PP Configuration 3

Intel® SST-PP Configuration 4

Intel® SST-CP

Intel® SST-BF High Priority

Intel® SST-BF Low Priority

Intel® SST-TF

8380 40 Cores / 270W / 2.3 GHz N/A N/A Yes 16 Cores / 2.4 GHz 24 Cores / 2.2 GHz Yes
8368Q 38 Cores / 270W / 2.6 GHz N/A N/A Yes N/A N/A Yes
8368 38 Cores / 270W / 2.4 GHz N/A N/A Yes 16 Cores / 2.5 GHz 22 Cores / 2.3 GHz Yes
8362 32 Cores / 265W / 2.8 GHz N/A N/A Yes N/A N/A Yes
8360Y 36 Cores / 250W / 2.4 GHz 32 Cores / 250W / 2.5 GHz 24 Cores / 220 W / 2.6 GHz Yes 12 Cores / 2.7 GHz 24 Cores / 2.1 GHz Yes
8358P 32 Cores / 240 W / 2.6 GHz N/A N/A Yes N/A N/A Yes
8358 32 Cores / 250 W / 2.6 GHz N/A N/A Yes 12 Cores / 2.8 GHz 20 Cores / 2.4 GHz Yes
8352Y 32 Cores / 205 W / 2.2 GHz 24 Cores / 185 W / 2.3 GHz 16 Cores / 185 W / 2.6 GHz Yes 12 Cores / 2.4 GHz 20 Cores / 2 GHz Yes
8352V 36 Cores / 195 W / 2.1 GHz 32 Cores / 180 W / 2 GHz 24 Cores / 155 W / 2 GHz Yes N/A N/A Yes
8352S 32 Cores / 205 W / 2.2 GHz 24 Cores / 185 W / 2.3 GHz 16 Cores / 185 W / 2.6 GHz Yes 12 Cores / 2.4 GHz 20 Cores / 2 GHz Yes
8352M 32 Cores / 185 W / 2.3 GHz 28 Cores / 185 W / 2.4 GHz 24 Cores / 185 W / 2.6 GHz Yes N/A N/A Yes
8351N 36 Cores / 225 W / 2.4GHz N/A N/A Yes 18 Cores / 2.6 GHz 18 Cores / 2.2 GHz Yes
6354 18 Cores / 205 W / 3 GHz N/A N/A Yes 8 Cores / 3.1 GHz 10 Cores / 2.8 GHz Yes
6348 28 Cores / 235 W / 2.6 GHz N/A N/A Yes 12 Cores / 2.8 GHz 16 Cores / 2.5 GHz Yes
6346 16 Cores / 205 W / 3.1 GHz N/A N/A Yes 4 Cores / 3.2 GHz 12 Cores / 3 GHz Yes
6342 24 Cores / 230 W / 2.8 GHz N/A N/A Yes 12 Cores / 2.9 GHz 12 Cores / 2.6 GHz Yes
6338T 24 Cores / 165 W / 2.1 GHz N/A N/A Yes 8 Cores / 2.3 GHz 16 Cores / 1.9 GHz Yes
6338N 32 Cores / 185 W / 2.2 GHz N/A N/A Yes 18 Cores / 2.4 GHz 14 Cores / 1.9 GHz Yes
6338 32 Cores / 205 W / 2 GHz N/A N/A Yes 12 Cores / 2.2 GHz 20 Cores / 1.8 GHz Yes
6336Y 24 Cores / 185 W / 2.4 GHz 12 Cores / 150 W / 2.9 GHz 8 Cores / 140 W / 3.1 GHz Yes 8 Cores / 2.5 GHz 16 Cores / 2.2 GHz Yes
6334 8 Cores / 165 W / 3.6 GHz N/A N/A Yes 4 Cores / 3.7 GHz 4 Cores / 3.4 GHz Yes
6330N 28 Cores / 165 W / 2.2 GHz N/A N/A Yes 18 Cores / 2.3 GHz 10 Cores / 1.9 GHz Yes
6330 28 Cores / 205 W / 2 GHz N/A N/A Yes 12 Cores / 2.1 GHz 16 Cores / 1.8 GHz Yes
6326 16 Cores / 185 W / 2.9 GHz N/A N/A Yes 4 Cores / 3 GHz 12 Cores / 2.6 GHz Yes
6314U 32 Cores / 205 W / 2.3 GHz N/A N/A Yes 12 Cores / 2.5 GHz 20 Cores / 2.1 GHz Yes
6312U 24 Cores / 185 W / 2.4 GHz N/A N/A Yes 8 Cores / 2.6 GHz 16 Cores / 2.3 GHz Yes
5320T 20 Cores / 150 W / 2.3 GHz N/A N/A Yes 6 Cores / 2.6 GHz 14 Cores / 2.1 GHz Yes
5320 26 Cores / 185 W / 2.2 GHz N/A N/A Yes 12 Cores / 2.5 GHz 14 Cores / 2 GHz Yes
5318Y 24 Cores / 165 W / 2.1 GHz 24 Cores / 150 W / 1.9 GHz 22 Cores / 150 W / 2 GHz Yes 8 Cores / 2.3 GHz 16 Cores / 1.8 GHz Yes
5318S 24 Cores / 165 W / 2.1 GHz 24 Cores / 150 W / 1.9 GHz 22 Cores / 150 W / 2 GHz Yes 8 Cores / 2.3 GHz 16 Cores / 1.8 GHz Yes
5318N 24 Cores / 150 W / 2.1 GHz 20 Cores / 135 W / 2GHz N/A Yes 8 Cores / 2.3 GHz 16 Cores / 2 GHz Yes
5317 12 Cores / 150 W / 3 GHz N/A N/A Yes 4 Cores / 3.2 GHz 8 Cores / 2.8 GHz Yes
5315Y 8 Cores / 140 W / 3.2 GHz 6 Cores / 125 W / 3.2 GHz 4 Cores / 115 W / 3.4 GHz Yes 2 Cores / 3.3 GHz 6 Cores / 2.6 GHz Yes
4316 20 Cores / 150 W / 2.3 GHz N/A N/A N/A N/A N/A N/A
4314 16 Cores / 135 W / 2.4 GHz N/A N/A N/A N/A N/A N/A
4310T 10 Cores / 105 W / 2.3 GHz N/A N/A N/A N/A N/A N/A
4310 12 Cores / 120 W / 2.1 GHz N/A N/A N/A N/A N/A N/A
4309Y 8 Cores / 105 W / 2.8 GHz 8 Cores / 95 W / 2.6 GHz 8 Cores / 85 W / 2.3 GHz N/A N/A N/A N/A