Deep Learning Performance Boost by Intel VNNI

Authors:

Shufan Wu, Feng Tian, Haihao Shen

Overview

Most deep learning applications today use 32-bits of floating point precision for training and inference workloads. In the previous generation of Intel Xeon Scalable processors, the convolution operations predominant in neural network workloads were implemented in the Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) using the FP32 data type via the vfmadd231ps instructions in the Intel® AVX-512 instruction set. Intel Xeon Scalable processors were the first Intel Xeon CPUs to include Intel AVX-512, with up to two 512 bit FMA units computing in parallel per core, enabling the execution of two vfmadd231ps instructions in a given cycle.

Lately, the Int8 data type has been used successfully for deep learning inference with a  significant boost to performance and little loss of accuracy. Int8 uses 8 bits to represent integer data with 7 bits of mantissa and a sign bit versus FP32 uses 32 bits to represent floating point data with 22 bits of Mantissa, 8 bits of exponent and a sign bit. This reduction in number of bits with Int8 used for inference brings the benefits of better memory and compute utilization, since less data is being transferred and data is being processed more efficiently. Previous generation Intel Xeon Scalable processors implemented convolution operations in Intel MKL-DNN using the Intel AVX-512 instructions vpmaddubsw, vpmaddwd, and vpaddd to take advantage of low-precision data. Although this gave some performance improvement compared to the use of FP32 data types for convolution, the use of three instructions in Int8 convolution and the microarchitecture limit of only two 512-bit instructions in a clock cycle leaves room for further innovation.

With the launch of 2nd Gen Intel Xeon Scalable Processors, The lower-precision (INT8) inference performance has seen gains thanks to the Intel® Deep Learning Boost (Intel® DL Boost) instruction.Both inference throughput and latency performance are significantly improved by leveraging quantized model. Built on the success of Intel DL Boost instructions, the upcoming next generation Intel Xeon introduces VNNI for Bfloat16, which introduces the speed-up of the training throughput and shortens the time to train.

As part of the entire solution for workloads running with low-precision data type, Intel also provides software toolkits to make the development and deployment easier. Intel® AI Analytics Toolkit includes popular deep learning frameworks such as Tensorflow and PyTorch optimized with Intel® DL Boost to maximize training and inference performance on Xeon Processors as well as low precision tools for quantizing the models. OpenVINO™ toolkit, extends and maximizes the performance of inference workloads for efficient deployment across Intel hardware.

Running Inference with INT8
 

Intel Low Precision Optimization Tool

Intel Low Precision Optimization Tool, is an open-sourced python library which is intended to deliver unified low-precision conversion and optimization interface across multiple Intel optimized DL frameworks including Tensorflow, PyTorch and MXNet on both CPU and GPU. Leveraging this tool, users can easily quantize a FP32 model from scratch. The tool can also help with auto-tuning the model through a configuration file, to achieve the balance between performance and accuracy.

Supported Models

With the recent v1.0a, Intel Low Precision Optimization Tool supports 30 models covering image classification, object detection, NLP, and recommendation system, detailed models listed in below tables,

Model

Framework

Tuning Strategy

INT8 Tuning Accuracy

FP32 Accuracy Baseline

INT8/FP32 Acc Ratio

[(INT8-FP32)/FP32]

INT8/FP32 Perf Ratio

ResNet50 V1

TensorFlow

mse

73.28%

73.54%

-0.35%

2.99x

ResNet50 V1.5

TensorFlow

bayesian

75.70%

76.26%

-0.73%

1.95x

ResNet101

TensorFlow

basic

76.68%

75.58%

1.46%

3.03x

Inception V1

TensorFlow

basic

69.54%

69.48%

0.09%

2.18x

Inception V2

TensorFlow

basic

74.32%

74.38%

-0.08%

1.69x

Inception V3

TensorFlow

basic

76.54%

76.90%

-0.47%

2.02x

Inception V4

TensorFlow

basic

79.74%

80.12%

-0.47%

3.40x

ssd_resnet50_v1

TensorFlow

basic

37.80%

38.01%

-0.55%

1.82x

Model

Framework

Tuning Strategy

INT8 Tuning Accuracy

FP32 Accuracy Baseline

INT8/FP32 Acc Ratio

[(INT8-FP32)/FP32]

INT8/FP32 Perf Ratio

DLRM

PyTorch

basic

80.21%

80.27%

-0.08%

1.87x

BERT-Large MRPC

PyTorch

basic

87.90%

88.30%

-0.45%

2.38x

BERT-Large SQUAD

PyTorch

basic

92.15%

93.05%

-0.96%

1.42x

BERT-Large CoLA

PyTorch

basic

62.10%

62.60%

-0.80%

1.76x

BERT-Base STS-B

PyTorch

basic

88.50%

89.30%

-0.90%

3.05x

BERT-Base CoLA

PyTorch

basic

58.30%

58.80%

-0.85%

3.01x

BERT-Base MRPC

PyTorch

basic

88.30%

88.70%

-0.45%

2.34x

BERT-Base SST-2

PyTorch

basic

90.90%

91.90%

-1.09%

1.64x

BERT-Base RTE

PyTorch

basic

69.30%

69.70%

-0.57%

2.95x

BERT-Large RTE

PyTorch

basic

72.90%

72.60%

0.41%

2.38x

BERT-Large QNLI

PyTorch

basic

91.00%

91.80%

-0.87%

2.25x

ResNet50 V1.5

PyTorch

bayesian

75.60%

76.10%

-0.66%

2.76x

ResNet18

PyTorch

bayesian

69.50%

69.80%

-0.43%

2.61x

ResNet101

PyTorch

bayesian

77.00%

77.40%

-0.52%

2.64x

Model

Framework

Tuning Strategy

INT8 Tuning Accuracy

FP32 Accuracy Baseline

INT8/FP32 Acc Ratio

[(INT8-FP32)/FP32]

INT8/FP32 Perf Ratio

ResNet50 V1

MXNet

mse

76.40%

76.80%

-0.52%

3.73x

MobileNet V1

MXNet

mse

71.60%

72.10%

-0.69%

3.02x

MobileNet V2

MXNet

mse

71.00%

71.10%

-0.14%

3.88x

SSD-ResNet50

MXNet

basic

29.50%

29.70%

-0.67%

1.86x

SqueezeNet V1

MXNet

mse

57.30%

57.20%

0.18%

2.88x

ResNet18

MXNet

mse

70.50%

70.40%

0.14%

2.98x

Inception V3

MXNet

mse

78.20%

78.00%

0.26%

3.35x

Tool Usage

On quantizing a FP32 model using the Intel Low Precision Optimization Tool, users can either leverage the pre-defined accuracy metrics supported by the tool, or customize evaluating function, evaluation dataset and accuracy metrics for calibration. It’s expected using pre-defined accuracy metrics is the common scenario but still keep the flexibility to use customized calibration ingredients.

Below are two examples to demonstrate the usage of this tool.

ResNet50 on Tensorflow

Prerequisite
1. Installation
# Install Intel Low Precision Optimization Tool
pip install ilit

# Install Intel Optimized Tensorflow 1.5.2
pip install intel-tensorflow==1.5.2
2. Prepare Dataset
TensorFlow models repo provides scripts and instructions to download, process and convert the ImageNet dataset to the TF records format.

3. Prepare pre-trained model
In this version, Intel Low Precision Optimization Tool just supports PB file as input for TensorFlow backend, so we need pre-trained pb files for model planned to use. For some models pre-trained pb files can be found in Model Zoo for Intel Architecture (part of Intel AI Analytics Toolkit), which is a repository of optimized models from Intel.  For others, the pb files can be obtained by converting the checkpoint files. We will give an example with Inception_v1 to show how to get the pb file by a checkpoint file.

Download the checkpoint file from here
wget http://download.tensorflow.org/models/inception_v1_2016_08_28.tar.gz
tar -xvf inception_v1_2016_08_28.tar.gz
Exporting the Inference Graph
git clone https://github.com/tensorflow/models
cd models/research/slim
python export_inference_graph.py \
        --alsologtostderr \
        --model_name=inception_v1 \
        --output_file=/tmp/inception_v1_inf_graph.pb
Use Netron to get the input/output layer name of inference graph pb, for Inception_v1 the output layer name is InceptionV1/Logits/Predictions/Reshape_1

Freezing the exported Graph, please use the tool freeze_graph.py in tensorflow repo

python freeze_graph.py \
        --input_graph=/tmp/inception_v1_inf_graph.pb \
        --input_checkpoint=./inception_v1.ckpt \
        --input_binary=true \
        --output_graph=./frozen_inception_v1.pb \
        --output_node_names=InceptionV1/Logits/Predictions/Reshape_1

Run
Note: The model name with * means it comes from models, please follow the step Prepare pre-trained model to get the pb files.

Download pre-trained PB
wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v1_6/resnet50_fp32_pretrained_model.pb

cd examples/tensorflow/image_recognition
python main.py -b 10 -a 28 -e 1 -g /PATH/TO/resnet50_fp32_pretrained_model.pb \
        -i input -o predict -r -d /PATH/TO/imagenet/ \
        --resize_method crop --config ./resnet50_v1.yaml


DLRM on PyTorch

Prerequisite

1. Installation

# Install Intel Low Precision Optimization Tool
pip install ilit

# Install PyTorch
pip install torch==1.5.0+cpu -f https://download.pytorch.org/whl/torch_stable.html

# Install sklearn
pip install scikit-learn

2. Prepare Dataset
The code supports interface with the Criteo Terabyte Dataset

download the raw data files day_0.gz, ...,day_23.gz and unzip them Specify the location of the unzipped text files day_0, ...,day_23, using --raw-data-file=<path/day> (the day number will be appended automatically)

These are then pre-processed (categorize, concat across days...) to allow using with dlrm code
The processed data is stored as .npz file in <root_dir>/input/.npz
The processed file (.npz) can be used for subsequent runs with --processed-data-file=<path/ . npz>

3. Prepare pretrained model
Corresponding pre-trained model is available under CC-BY-NC license and can be downloaded here dlrm_emb64_subsample0. 875_maxindrange10M_pretrained.pt

Run
cd examples/pytorch/dlrm
./run_and_time.sh


Examples of enabling Intel Low Precision Optimization Tool
This is a tutorial of how to enable DLRM model with the tool.

User Code Analysis
Intel Low Precision Optimization Tool supports two usages:

User specifies fp32 'model', calibration dataset 'q_dataloader', evaluation dataset "eval_dataloader" and metrics in tuning.metrics field of model-specific yaml config file.

User specifies fp32 'model', calibration dataset 'q_dataloader' and a custom "eval_func" which encapsulates the evaluation dataset and metrics by itself.

As DLRM's matrics is 'f1', so customer should provide evaluation function 'eval_func', it's suitable for the second use case.

Write Yaml config file

In examples directory, there is conf.yaml. We could remove most of items and only keep mandatory item for tuning.

framework:
  - name: pytorch

device: cpu

tuning:
    accuracy_criterion:
      - relative: 0.01
    timeout: 0
    random_seed: 9527

Here we set the accuracy target as tolerating 0.01 relative accuracy loss of baseline. The default tuning strategy is basic strategy. The timeout 0 means early stop as well as a tuning config meet accuracy target.

Note : Intel Low Precision Optimization Tool don't support "mse" tuning strategy for pytorch framework

prepare
PyTorch quantization requires two manual steps:

Add QuantStub and DeQuantStub for all quantizable ops.
Fuse possible patterns, such as Linear + Relu.
It's intrinsic limitation of PyTorch quantization imperative path. No way to develop a code to automatically do that. The related code changes please refer to examples/pytorch/dlrm/dlrm_s_pytorch_tune.py.

code update
After prepare step is done, we just need update run_squad_tune.py and run_glue_tune.py like below

class DLRM_DataLoader(DataLoader):
    def __init__(self, loader=None):
        self.loader = loader
    def __iter__(self):
        for X_test, lS_o_test, lS_i_test, T in self.loader:
            yield (X_test, lS_o_test, lS_i_test), T
eval_dataloader = DLRM_DataLoader(test_ld)
fuse_list = []
for i in range(0, len(dlrm.bot_l), 2):
    fuse_list.append(["bot_l.%d" % (i), "bot_l.%d" % (i + 1)])
dlrm = fuse_modules(dlrm, fuse_list)
fuse_list = []
for i in range(0, len(dlrm.top_l) - 2, 2):
    fuse_list.append(["top_l.%d" % (i), "top_l.%d" % (i + 1)])
dlrm = fuse_modules(dlrm, fuse_list)
dlrm.bot_l.insert(0, QuantStub())
dlrm.bot_l.append(DeQuantStub())
dlrm.top_l.insert(0, QuantStub())
dlrm.top_l.insert(len(dlrm.top_l) - 1, DeQuantStub())
import ilit
tuner = ilit.Tuner("./conf.yaml")
tuner.tune(dlrm, eval_dataloader, eval_func=eval_func)

PyTorch

By interoperating with the FBGEMM (a low-precision, high-performance matrix-matrix multiplications and convolution library for server-side inference owned by Facebook), a FP32 model can be quantized to INT8 and run the inference with INT8 precision at imperative path. The model quantization needs the PyTorch quantization API, therefore involving scripts and model implementation changes.

OpenVINO Toolkit
 

Tool Usage

OpenVINO toolkit includes a Post-Training Optimization tool, since release 2020.2, which helps for downloading, converting and quantization selective deep learning models into low-precision counterparts; The supported models are published on https://github.com/opencv/open_model_zoo/tree/master/tools/downloader/. Below is an example illustrating how to quantize the ResNet50 FP32 model.

Download the model

The basic usage is to run the script like this:

./downloader.py --all

This will download all models. The --all option can be replaced with other filter options to download only a subset of models. See the "Shared options" section.

By default, the script will download models into a directory tree rooted in the current directory. To download into a different directory, use the -o/--output_dir option:

Convert the model to Intermediate Representation (IR) format

The basic usage is to run the script like this:

./converter.py --all

This will convert all models into the Inference Engine IR format. Models in PyTorch and Caffe2 formats need to be converted in ONNX format first

Model Quantization

Before you run the model quantizer, you must prepare a directory with the datasets required for the quantization process. This directory will be referred to as <DATASET_DIR> below. See the "Dataset directory layout" section for information on the expected contents of that directory.

The basic usage is to run the script like this:

./quantizer.py --all --dataset_dir <DATASET_DIR>

This will quantize all models for which quantization is supported. Other models are ignored.

Supported Models

OpenVINO Toolkit releases an Open Model Zoo to expedite development of high-performance deep learning inference applications. Use these free pre-trained models instead of training your own models to speed-up the development and production deployment process. By leveraging the model downloader and other automation tools, users can step-by-step download, convert and quantize the supported models.

Running Training & Inference with BFloat16
 

PyTorch

Thanks to the Intel Extension for PyTorch (IPEX, a Python package to extend official PyTorch. It is designed to make the Out-of-Box user experience of PyTorch CPU better while achieving good performance), BFloat16-base training & inference is enabled with PyTorch at imperative path; by leveraging the VNNI BFloat16 instructions, a reasonable performance speed-up can be achieved with training(or inference) scripts changes.

Below is an example illustrating how to enable training and inference on ResNet50, with IPEX.

PREPARE:
   wget https://repo.continuum.io/archive/Anaconda3-5.0.0-Linux-x86_64.sh -O anaconda3.sh
   chmod +x anaconda3.sh
   ./anaconda3.sh -b -p ~/anaconda3
   ./anaconda3/bin/conda create -yn pytorch
   export PATH=~/anaconda3/bin:$PATH
   source ./anaconda3/bin/activate pytorch
   pip install sklearn onnx
   conda config --add channels intel
   conda install ninja pyyaml setuptools cmake cffi typing intel-openmp  
   conda install mkl mkl-include numpy -c intel --no-update-deps
   source /opt/rh/devtoolset-7/enable

  #build pytorch and intel-pytorch-extension
   see https://github.com/intel/intel-extension-for-pytorch#installation

  #build jemalloc
   cd ..
   git clone  https://github.com/jemalloc/jemalloc.git   
   cd jemalloc
   ./autogen.sh
   ./configure --prefix=your_path(eg: /home/tdoux/tdoux/jemalloc/)
   make
   make install

  export LD_PRELOAD=/path/to/jemalloc/lib/libjemalloc.so

  #build vision
  cd ..
  git clone https://github.com/pytorch/vision
  cd vision
  python setup.py install

  git clone  https://gitlab.devtools.intel.com/ia-optimized-model-for-pytorch/imagenet.git
  git checkout imagenet-xiaobing
  cd imagenet/imagenet

  export DNNL_PRIMITIVE_CACHE_CAPACITY=1024

TEST:
(
  example:
    Thread(s) per core:    4  
    Core(s) per socket:    24
)

BF16:
    1. training(4instance, 24core/ins):
    2. training accuracy(4 nodes, batch_size=64 for every node)::
    3. inference throughput(4instance, 24core/ins):
      #command:
      export LD_PRELOAD= "path/lib/libjemalloc.so"
      export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_
ms:9000000000"
      bash run_inference_cpu_multi_instance_ipex.sh resnet50 bf16
    4. inference realtime(24instance, 4core/ins):
      #command:
      export LD_PRELOAD= "path/lib/libjemalloc.so"
      export
MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_
ms:9000000000"
      bash run_inference_cpu_multi_instance_latency_ipex.sh resnet50 bf16
    5. inference accuracy:
      bash run_inference_cpu_accuracy_ipex.sh resnet50 bf16

TensorFlow

Tensorflow AutoMixedPrecision Pass easily enabled mixed-precision with BF16 with predefined rules defining which layer can cast BF16, default rules following below table,

List Name

Comment

OPS

White List

•Numerically-safe

•Performance-critical

•Can run in BF16.

Conv2D, Conv2DBackpropFilter, DepthwiseConv2dNative, MatMul, etc.

Clear List

•Do not have numerically-significant effects

•Can run in BF16.

Concat, Identity, MaxPool, Relu, Reshape, Shape, Slice, Squeeze, Transpose, etc.

TensorList Ops: TensorListConcat, etc.

Black List

•Numerically-dangerous

•Effects may also be observed in downstream nodes

Exp, Expm1, L2Loss, Mean, Pow, SaveV2,

Softmax, SoftmaxCrossEntropyWithLogits, SparseSoftmaxCrossEntropyWithLogits,

Sum

Gray List

•Numerically-safe

•Can run in BF16

•May be made unsafe by an upstream BLACKLIST op.

Add, AddN, AvgPool, BiasAdd, FusedBatchNormV2, Mul, etc.

Uses can also customize the rules by changing file https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/grappler/optimizers/auto_mixed_precision_lists.h

Using AutoMixedPrecision Tool are two steps,

Step1: Enable AutoMixedPrecision

Step 2: Update Convert List

OpenVino

OpenVino starts to support running inference with BFloat16 since 2020.4. Currently the BFloat16 solution uses Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) and supports inference of the following layers in BF16 computation mode:

●      Convolution
●      FullyConnected
●      InnerProduct
●      LRN
●      Pooling

This means that BF16 inference can only be performed with the CPU plugin on the layers listed above. All other layers are executed in FP32.

Benchmark App sample released with OpenVino is the best performance reference. If platform supports the AVX512_BF16 instruction. In this case, a regular float32 model is converted to bfloat16 internal representation and inference is provided with bfloat16 layers usage. Below is the example command line to enable this feature on the CPU device with the AVX512_BF16 instruction,

$ benchmark_app -m <model.xml> -enforcebf16=true

Success Stories

Leading performance of Text-to-Speech (TTS) with Intel® Deep Learning Boost using Bfloat16 capability on 3rd Gen Intel® Xeon® Scalable Processors. This blog introduces the optimization techniques and performance results for the customized WaveRNN and PWaveNet models running on the 3rd Gen Intel Xeon Scalable processor family.
https://www.intel.com/content/www/us/en/artificial-intelligence/posts/intel-xeon-text-to-speech.html

Accelerating AI performance on 3rd Gen Intel® Xeon® Scalable processors with TensorFlow and Bfloat16; In this blog, by investigating the performance improvement of mixed precision training and inference with bfloat16 on 3 models - ResNet50v1.5, BERT-Large (SQuAD), and SSD-ResNet34, it indicates that the combination of the latest 3rd Gen Intel Xeon Scalable processors with Intel Deep Learning Boost’s new bfloat16 format can achieve a performance increase of up to 1.7x to 1.9x over FP32 performance on 2nd Gen Intel® Xeon® Scalable Processors, without any loss of accuracy.
https://blog.tensorflow.org/2020/06/accelerating-ai-performance-on-3rd-gen-processors-with-tensorflow-bfloat16.html

Intel and Facebook Accelerate PyTorch Performance with 3rd Gen Intel® Xeon® Processors and Intel® Deep Learning Boost’s new BFloat16 capability;  In this blog,Intel and Facebook continue their collaboration to improve performance of machine learning models on PyTorch, this time working together to enable BF16 technology and deliver up to 1.64x BF16 over FP32 training performance improvements on the 3rd Gen Intel Xeon scalable processors. This collaboration will benefit the PyTorch community by enabling faster training and inference times on CPUs.
https://www.intel.com/content/www/us/en/artificial-intelligence/posts/intel-facebook-boost-bfloat16.html

Bfloat16 Optimization Boosts Alibaba Cloud BERT Model Performance on 3rd Gen Intel® Xeon® Scalable Processors; This blog introduces the optimization on on BERT, and leveraging the oneAPI Deep Neural Network Library (oneDNN) 1.3 on the 3rd Gen Intel Xeon Scalable processor with Intel DL Boost to achieve a 1.83x gain with the BF16 solution. In addition, the BF16 solution achieved the same accuracy with the FP32 solution on the MRPC dataset for classification task (both are 83.59% for a proxy model).
https://www.intel.com/content/www/us/en/artificial-intelligence/posts/alibaba-blog.html

“Thanks to the new Vector Neural Network Instructions (AVX-512 VNNI), deep learning frameworks will speed up typical machine learning operations like convolution, and automatically improve inference performance over a wide range of workloads.”
ttps://aws.amazon.com/blogs/aws/now-available-new-c5-instance-sizes-and-bare-metal-instances/

"We continue to collaborate closely with Intel on adding more INT8 models in GluonCV. Powered by Intel Deep Learning Boost (VNNI), INT8-quantized models in GluonCV can achieve significant speedup over their 32bit floating-point counterparts."
https://medium.com/apache-mxnet/gluoncv-0-5-15-new-models-855db64afae7

“GluonCV delivered some quantized models to improve the performance and reduce the deployment costs for the computer vision inference tasks. In real production, there are two main benefits of lower precision (INT8). First, the computation can be accelerated by the low precision instruction, like Intel Vector Neural Network Instruction (VNNI). Second, lower precision data type would save the memory bandwidth and allow for better cache locality and save the power. The new feature can get up to 4X performance speedup”
https://gluon-cv.mxnet.io/build/examples_deployment/int8_inference.html

Alicloud advertising VNNI instruction sets and 2 to 4 times the computing performance in deep learning inference scenarios in their new ECS (CLX-based) VM: https://www.alibabacloud.com/campaign/6th-generation-ecs  (scroll to the bottom).

Text-to-speech on CLX (Intel Xeon Platinum 8255C CPU) with optimizations for BF16 for CPX: https://arxiv.org/abs/2005.05551  (see Acknowledgements)

Reference

[1] Intel AI Analytics Toolkit, Powered by oneAPI
https://software.intel.com/content/www/us/en/develop/tools/oneapi/ai-analytics-toolkit.html

[2] Intel Distribution of OpenVINO Toolkit, Powered by oneAPI
https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html

[3] Intel WhitePaper Maximize inference performances on CPU with TF

[4] Model Downloader and other automation tools:
ttps://github.com/opencv/open_model_zoo/tree/master/tools/downloader#model-downloader-and-other-automation-tools

[5] Intel Quantization Tools: https://github.com/IntelAI/models/blob/master/docs/image_recognition/quantization/Tutorial.md

 

Notices and Disclaimers

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.
​Performance results are based on testing as of 24th July 2019 by Intel and may not reflect all publicly available security updates. No product or component can be absolutely secure. Test Configuration: Cascade Lake 8280 two-socket system, with 192GB DDR4 2933 RAMs. microcode: 0x500002c CentOS 7.6.1810, Linux Kernel version: 3.10.0-957.el7.x86_64. Spectre-meltdown vulnerability variants mitigated (1,2,3,3a,4, L1TF) per https://github.com/speed47/spectre-meltdown-checker.
Intel technologies' features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No product or component can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.
Intel does not control or audit third-party data.  You should review this content, consult other sources, and confirm whether referenced data are accurate.
Intel, the Intel logo, and Intel Xeon are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. © Intel Corporation