Third-generation Intel® Xeon® Scalable processors have significant enhancements over previous processors. Specifically, they have up to 40 cores per socket, larger CPU caches (L1, L2, and L3), and more memory bandwidth (channels). They also support higher speed DIMMs, faster UPI lanes, faster I/O, 4th generation PCIe*, and several other architectural improvements. These enhancements yield better performance in user applications and system services that make intensive use of CPU, memory, and I/O.
Due to increasing digitization and economic growth, the volume of data at rest and in motion continues to grow exponentially. As this volume increases, and as more customers and services move to the cloud, the need for security becomes paramount. That is why data is cryptographically protected while moving over the network and while at rest in storage. Since these cryptographic security operations can be computationally intensive, 3rd generation Intel® Xeon® processors come with an instruction set that's based on Intel® Advanced Vector Extensions (Intel® AVX-512) for increased speed. Intel AVX-512 is an instruction set that can accelerate performance. It uses CPU registers that are 512 bits wide to pack multiple operations into one CPU clock cycle.1
New Cryptographic CPU Instructions in 3rd Generation Intel® Xeon® Processors
Cryptographic algorithms are generally classified as asymmetric, symmetric, or hash based. With time, as the need for security increases, so does the need for newer and stronger cryptographic protocols. This leads to newer and stronger asymmetric algorithms like Elliptic Curve X25519 and an increase in key size for symmetric algorithms for the Advanced Encryption Standard (AES). The new cryptographic instructions help accelerate these operations. For example:
- Integer fused multiply add (IFMA) instructions accelerate public key cryptography by using Intel AVX-512 registers to multiply and add large numbers in a single operation to increase the speed of asymmetric algorithms like RSA and elliptic curve cryptography.
- Vectorized AES instructions and “carry-less” multiplication instructions accelerate symmetric algorithms like AES.
- Secure hash algorithmic (SHA) extension instructions accelerate secure hash algorithms that are used to generate random numbers (especially for transport layer security [TLS] handshakes in HTTPS connections).2
Software Stack that Enables Cryptographic New Instructions from Intel
On Linux*, Intel provides software libraries that allow applications to take advantage of these cryptographic operations. Using a multibuffer technique, multiple independent data buffers can be processed in parallel (asynchronously) with the CPU instructions previously mentioned.
Nearly 80% of all web traffic uses HTTPS, which is also referred to as HTTP over transport layer security (TLS) or Secure Sockets Layer (SSL), for secure networking on the internet.3 HTTPS is now prevalent in browsers used in phones, tablets, and cars, due to the broad use of Representational State Transfer (REST) APIs in distributed applications and microservices. We're specifically speaking about the acceleration of cryptographic operations in HTTPS traffic for web applications. An HTTPS web transaction typically involves:
- A TLS handshake that uses asymmetric cryptographic operations
- Data exchange that uses the symmetric cryptographic operation
- Random-number generation that uses SHA operations.
For example, the asynchronous NGINX* web server queues up multiple requests that are passed down to the TLS library stack as shown in the following figure. OpenSSL* can handle these requests asynchronously and hands them off to a crypto provider plug-in called Intel® QuickAssist Technology (Intel® QAT) engine that supports:
- Asymmetric encryption algorithms like RSA and ECDH via the crypto multibuffer library
- Symmetric operations via the IPSec multibuffer library4 as shown in Figure 1.
Azure Virtual Machine (VM) Images
To ease the adoption of cryptographic instructions for 3rd generation Intel Xeon processors, Intel partnered with BitNami* (VMware*) to publish Debian* VM images for Linux that incorporate the software libraries previously mentioned in popular cloud providers like Azure. This partnership provides the customer with a functional and efficient VM image that they can use to accelerate HTTPS traffic in web applications. As part of this exercise, Intel published two VM images–one for WordPress* and one for NGINX. WordPress is an open-source content management system that has been a popular website management system for years. NGINX is the most popular web server today.5 While the NGINX image provides an NGINX web server with a software stack from Intel, the WordPress image is based on the NGINX web server, PHP*, and the MariaDB* database along with the cryptographic software stack from Intel.
Azure VM Offerings
These VM images are available in the Azure marketplace. When run on VMs based on 3rd generation Intel Xeon processors, they can offer great performance benefits for the previously discussed cryptographic operations. The Azure portfolio for general-purpose infrastructure as a service (IaaS) includes multiple VM offerings based on these processors:
- Dv5/Dsv5 VMs offer a combination of virtual CPU (vCPUs) and memory to meet the requirements of enterprise workloads like small-to-medium databases, low-to-medium traffic web servers, application servers, and more.
- Ev5/EsV5 VMs offer a higher vCPU-to-memory ratio. This implies more memory capacity than the corresponding Dv5 and Dsv5, and is ideal for memory-intensive enterprise applications, relational database servers, in-memory analytics workloads, and more.
- Ddv5/Ddsv5 and Edv5/Edsv5 VMs include ultra-fast and larger local solid-state drive (SSD) disk storage. These VMs are ideal for applications that require low latency and high-speed local storage access. For example, small-to-medium database servers and web servers. Learn more about Ddv5/Dds5 and Edv5/Edsv5.
- Ebsv5/Ebdsv5 VMs deliver higher remote storage performance than the previous-generation Ev4 VM series, offering—up to 120,000 IOPS and 4,000 Mbps of remote disk storage throughput. Learn more about Ebsv5/Ebdsv5.
All of these hyperthreaded VMs that are based on 3rd generation Intel Xeon processors can reach an all-core turbo clock speed of up to 3.5 GHz. They feature Intel® Turbo Boost Technology v2.0, Intel AVX-512, and Intel® Deep Learning Boost.
Crypto-NI Performance Benefits
To demonstrate how these cryptographic CPU instructions improve performance, Intel partnered with Principled Technologies* to publish performance reports comparing Azure Dv5 VMs based on 3rd generation Intel Xeon scalable processors and Dv4 VMs based on 2nd generation Intel Xeon scalable processors for WordPress and Siege. WordPress is an open-source content management system and Siege is a load generator. This workload, based on OSS performance, uses WordPress for content, NGINX as the web server, FastCGI Process Manager (PHP-FPM) for application logic, and MariaDB as the database server.
To demonstrate the performance impact of 3rd generation Intel Xeon processors and its cryptographic CPU instructions, we considered different cipher suites based on their popularity and relevance. Cipher suites combine different TLS versions, asymmetric key exchange algorithms, digital signature algorithms, symmetric encryption algorithms, and hash-based message authentication protocols.
As shown in Figure 2, these workloads improve performance6 by as much as 1.53x with Dv5 VMs for Azure that are based on 3rd generation Intel Xeon processors, and that are on different VM sizes (8, 16, and 64 virtual processors) when compared to earlier Dv4 VMs based on the previous version of the Intel Xeon processor.
To check if the capability is exposed in an operating environment (such as a Linux host or Linux VM), you can run the command lscpu | grep Flags.
The CPU features highlighted in the lscpu output indicate that these features are exposed and present in your operating environment. For example:
#lscpu | grep Flags
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512vbmi umip avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid fsrm md_clear flush_l1d arch_capabilities
The performance benefits that users experience depend on how much time the CPU spends performing the cryptographic operations–the higher the hot spots, the greater the benefits.
General Performance Benefits
To compare performance of different versions of Intel® Xeon® Scalable processors, Intel partnered with Principled Technologies* to publish performance reports on Azure Eds_v5 and Dds_v5 VMs on popular database servers like MySQL*, MariaDB, and Microsoft SQL Servers*, using TPROC-C and TPROC-H HammerDB workloads.
TPROC-C is an online transaction processing (OLTP) workload. Such a workload typically occurs when a database receives multiple user requests to access the data and change it over time. These modifications, which are called transactions, must adhere to the properties called atomicity, consistency, isolation, and durability (ACID) to ensure data validity and consistency.
TPROC-H is an analytic workload. Analytic workloads, also called decision support, data warehousing, or business intelligence workloads, process complex ad-hoc queries on large volumes of data. The primary purpose of analytic workloads is to read data from the database.
As shown in Figure 3, these workloads attain performance5 gains up to 1.4x with Dv5 and Ev5 VMs for Azure that are based on the 3rd generation Intel Xeon processors on different VM sizes (8, 16, and 64 virtual processors) when compared to the previous generation of Dv4 and Ev4 VMs.
Azure Services Improvements
Azure conducted performance stress tests using Apache JMeter* on the Dv5 VMs that used 3rd generation Intel Xeon processor with the previously mentioned VM images from Intel and BitNami. Those images, which include the software library stack for cryptographic operations, achieved significant performance gains. They used JMeter, a tool based on Java* for the stress tests, which simulated a 30-minute load test on a WordPress blog. A JMeter simulation consists of user actions on the WordPress blog. For example: Sign in, create a blog, and then post a comment.
The results are listed as follows and depicted in Figure 4.
- Standard_D4dsv5 executed ~55% more threads (user requests) than Standard_D4dsv4
- Standard_D4dsv5 showed~58% higher throughput than Standard_D4dsv4
For more details and context for the previous performance observations, refer to the Compute learning track in:
Most online activities–banking, streaming, chatting, and reading–use HTTPS. As the volume of data generated and consumed grows exponentially, so do the computational resources required to keep these channels secure via TLS based HTTPS protocol. The 3rd generation Intel Xeon Scalable processor and its cryptographic new instructions provide efficient, significant benefits over the previous generation. To use these benefits seamlessly, VM images using a software stack from Intel are now available from cloud service providers like Azure and can be deployed in the new VM SKUs based on the 3rd generation Intel Xeon Scalable processors.
Aditya Gulavani, Dong-Yuan Chen, Karen Shemer, Sushma Kyasaralli Thimmappa, Jon Strang, Brian Will, Abraham Arce Moreno, Priyank Durugkar, Monica Ene-Pietrosanu and several others from Intel. Also a special thanks to our counterparts at Azure: Rishab Verma, Umakanth Puppala, Adrian Joian, Joel Pelly, Joe Sherman, and team. Thanks also to our counterparts at Principled Technologies (Sarah Catchings and team) and to the BitNami team for a wonderful continued collaboration.
2 Links for cryptographic ISA instructions from Intel:
- Crypto Acceleration: Enabling a Path to the Future of Computing
- Cryptography Processing with 3rd Generation Intel Xeon Scalable Processors
4 Cryptographic software libraries from Intel:
6 Performance reports from Principled Technologies on 3rd generation Intel Xeon Scalable processors on Dv5 VMs.
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.