Get Started with Intel MLPerf Submission with Intel Optimized Docker Images

ID 771895
Updated 11/10/2023
Version v3.1
Public

author-image

By

Get Started with Intel MLPerf v3.1 Submission with Intel Optimized Docker Images

MLPerf is a benchmark for measuring the performance of machine learning systems. It provides a set of performance metrics for a variety of machine learning tasks, including image classification, object detection, machine translation, and others. The benchmark is representative of real-world workloads and as a fair and useful way to compare the performance of different machine learning systems.

In this document, we'll show how to use the publicly accessible codes and scritps on GitHub, which was published by Mlcommons, to run Intel MLPerf v3.1 submission with Intel optimized Docker images. The following contents will refer to this GitHub repository as .

Intel Docker Images for MLPerf

The Intel optimized Docker images for MLPerf v3.1 can be built using the Dockerfiles. Example for building docker image with Dockerfile:

# Get the mlperf v3.1 workloads scritps from GitHub
git clone https://github.com/mlcommons/inference_results_v3.1.git

cd <THIS_REPO>/closed/Intel/code/resnet50/pytorch-cpu/docker/

bash build_resnet50_contanier.sh

HW configuration:

System Info Configuration detail
CPU SPR
OS CentOS Stream 8
Kernel 6.1.11-1.el8.elrepo.x86_64
Memory 1024GB (16x64GB 4800MT/s [4800MT/s])
Disk 1TB NVMe

Recommmended BIOS Knobs:

BIOS Knobs Recommended Value
Hyperthreading Enabled
Turbo Boost Enabled
Core Prefetchers Hardware,Adjacent Cache,DCU Streamer,DCU IP
LLC Prefetch Disable
CPU Power and Perf Policy Performance
NUMA-based Cluster Disabled
Energy Perf Bias Performance
Energy Efficient Turbo Disabled

Please also refer to Eagle Stream Platform Performance & Power Optimization Guide for more details.

Check System Health Using Intel® System Health Inspector:

Intel® System Health Inspector (aka svr-info) is a Linux OS utility for assessing the state and health of Intel Xeon computers. It is suggested to use svr-info first to check any system configuration issue before running any benchmark. Follow the Quick Start Guide for downloading and installation. The following are several key factors effecting the model performance.

CPU Couple CPU features impact MLPerf performance via related BIOS knobs, so please double check the CPU features with your BIOS knobs. Some important CPU features are Hyperthreading, number of NUMA nodes, Prefetchers and Intel Turbo Boost.

mlperf-bios

Memory One important system configuration is balanced DIMM population, which is suggested to set as balanced to get optimized performance.
Populate as many channels per socket as possible prior to adding additional DIMMs to the channel.
It might impact the memory bandwidth if two dimm share one channel.

Please also refer to Chapter 4 in Eagle Stream Platform Performance & Power Optimization Guide for more details.

From the results of svr-info, an example of unbalanced DIMM population is shown as follows,

mlperf-unbala

An exmaple of Balanced DIMM population is shown as follows,

mlperf-bala

You should also see good numbers for memory NUMA bandwidth if you also benchmark memory via svr-info.
Here are some reference numbers from a 2S SPR system.

mlperf-band

Power We recommend the intel_pstate Frequency Driver.
For best performance, set the Frequency Governor and Power and Perf Policy to performance.
Here are related recommended power settings from svr-info.

mlperf-band

Best Known Configurations:

sudo bash run_clean.sh

Known Issues

"Too many open files" error If users see "Too many open files" error while they build the docker image, the system configuratoin for the max number of open file might be too small.

Users could check the current setting by below command.

sysctl -r fs.file-max

If users see a small value like 10000, users could set a large value such as 980000 by below command.

sysctl -w fs.file-max=980000

Benchmarking using automation scripts

For your convinience, we prepare a set of automation scritps to help you download data, create docker, do data and model preprocessing, run accuracy, performance and compliance test in a batch. Please refer to ./automation/README.md for details about the usage. Example on for using automation scripts:

# Get the mlperf v3.1 workloads scritps from GitHub
git clone https://github.com/mlcommons/inference_results_v3.1.git

# Go to directory of automation scripts
cd <THIS_REPO>/closed/Intel/code/automation/

# Download dataset
bash download_dataset.sh <model> <location>
# <model> can be resnet50, retinanet, rnnt, 3d-unet-99.9, bert-99, gptj-99, or dlrm2-99.9
# <location> is where you save the data, which can be /data/mlperf_data

# Test model performance
PerformanceOnly="True" bash run.sh <model> <location>

# Test model Auccuracy
# Suppose you have done running the performance test workload, you can skip launching docker container and processing the data
Skip_docker_build="True" Skip_data_proprocess="True" AccuracyOnly="True" bash run.sh <model> <location>

For more details, please refer to the instructions in https://github.com/mlcommons/inferenceresultsv3.1/blob/main/closed/Intel/code/automation/README.md.

If you prefer to understand what the automation scripts do for you, we also provide instructions on how to run model performance/accuracy benchmarking step-by-step in the following sections.

Running models step-by-step

In the following sections, we'll show you how to set up and run each of the seven models:


Note: All the codes and scripts are publicly accissible and can be downloaded from GitHub. The following sessions will refer this GitHub repository as .

Get started with DLRM2

step-by-step

If you haven't already done so, build the Intel optimized Docker image for DLRM using:

cd <THIS_REPO>/closed/Intel/code/dlrm-99.9/pytorch-cpu-int8/docker
# Please firstly refer to the prerequisite file in the current directory to download the compiler before building the Docker image. 
bash build_dlrm-99.9_container.sh

Prerequisites

Use these commands to prepare the Deep Learning Recommendation Model (DLRM) dataset and model on your host system:

cd /data/   # or path to where you want to store the data
mkdir -p /data/mlperf_data/dlrm_2/model/bf16
mkdir -p /data/mlperf_data/dlrm_2/data_npy

# Prepare DLRM dataset
# Create a directory (such as /data/mlperf_data/dlrm_2/data_npy) which contain:
#        day_23_dense.npy
#     day_23_sparse_multi_hot.npz
#     day_23_labels.npy
#
# Learn how to get the dataset from:
#     https://github.com/mlcommons/inference/tree/master/recommendation/dlrm_v2/pytorch

# Prepare pre-trained DLRM model
cd /data/mlperf_data/dlrm_2/model/bf16
wget https://cloud.mlcommons.org/index.php/s/XzfSeLgW8FYfR3S/download
unzip weights.zip

cd <THIS_REPO>/closed/Intel/code/dlrm-99.9/pytorch-cpu/
export MODEL_DIR=/data/mlperf_data/dlrm_2/model/bf16


# dump model from snapshot to torch
bash run_dump_torch_model.sh

Note: wget commands use IPv6 by default, if your system uses IPv4, please add -4 option into the wget command to force it to use IPv4.

Set Up Environment

Follow these steps to set up the docker instance.

Start a Container

Use docker run to start a container with the optimized Docker image we pulled earlier. Replace /path/of/dlrm with the dlrm folder path created earlier (/data/dlrm for example):

docker run --name intel_inference_dlrm_int8 --privileged -itd --net=host --ipc=host \
  -v /path/of/dlrm:/data/dlrm_2_dataset mlperf_inference_dlrm2:3.1

Login to Docker Container

Login into a bashrc shell in the Docker instance.

docker exec -it intel_inference_dlrm_int8 bash

Preprocess model and dataset

If you need a proxy to access the internet, replace your host proxy with the proxy server for your environment. If no proxy is needed, you can skip this step:

export http_proxy="your host proxy"
export https_proxy="your host proxy"

Calibrate and dump int8 model

cd /opt/workdir/code/dlrm2-99.9/pytorch-cpu-int8
bash ./run_calibration.sh

Note: runcalibration script does not need to finish, once you see rocauc output you can ctrl+z to stop

Export model and dataset directory

# export model directory to saved model path
export MODEL_DIR=/data/mlperf_data/dlrm_2/model/bf16

# export dataset directory to saved dataset path where .npy .npz are stored.
export DATA_DIR=/data/mlperf_data/dlrm_2/data_npy

Run the Benchmark

# offline performance
source setup_env_offline.sh
bash run_main.sh offline int8                   

# offline accuracy
source setup_env_offline.sh
bash run_main.sh offline accuracy int8              

# server performance
source setup_env_server.sh
bash run_main.sh server int8

# server accuracy
source setup_env_server.sh   
bash run_main.sh server accuracy int8    

Get Started with GPT-J

step-by-step

Download and Prepare Dataset

export WORKLOAD_DATA=/data/mlperf_data/gpt-j
mkdir -p ${WORKLOAD_DATA}
  • Download cnn-dailymail calibration set
cd <THIS_REPO>/closed/Intel/code/gptj-99/pytorch-cpu/
python download-calibration-dataset.py --calibration-list-file calibration-list.txt --output-dir ${WORKLOAD_DATA}/calibration-data
  • Download cnn-dailymail validation set
python download-dataset.py --split validation --output-dir ${WORKLOAD_DATA}/validation-data

Download and prepare model

  • Get finetuned checkpoint
CHECKPOINT_DIR=${WORKLOAD_DATA}/gpt-j-checkpoint
wget https://cloud.mlcommons.org/index.php/s/QAZ2oM94MkFtbQx/download -O gpt-j-checkpoint.zip
unzip gpt-j-checkpoint.zip
mv gpt-j/checkpoint-final/ ${CHECKPOINT_DIR}

Note: wget commands use IPv6 by default, if your system uses IPv4, please add -4 option into the wget command to force it to use IPv4.

Build & Run Docker container from Dockerfile

If you haven't already done so, build the Intel optimized Docker image for GPT-J using:

cd <THIS_REPO>/closed/Intel/code/gptj-99/pytorch-cpu/docker
bash build_gpt-j_container.sh

docker run --name intel_gptj --privileged -itd --net=host --ipc=host -v ${WORKLOAD_DATA}:/opt/workdir/code/gptj-99/pytorch-cpu/data mlperf_inference_gptj:3.1
docker exec -it intel_gptj bash

cd code/gptj-99/pytorch-cpu

Generate quantized INT8 model

source setup_env.sh
bash run_quantization.sh

Run Benchmarks

  • Offline (Performance)
bash run_offline.sh
  • Offline (Accuracy)
bash run_offline_accuracy.sh
  • Server (Performance)
bash run_server.sh
  • Server (Accuracy)
bash run_server_accuracy.sh

Get Started with 3DUNET

step-by-step

If you haven't already done so, build the Intel optimized Docker image for 3DUNET using:

cd <THIS_REPO>/closed/Intel/code/3d-unet-99.9/pytorch-cpu/docker
bash build_3dunet_container.sh

Prerequisites

Use these commands to prepare the 3DUNET dataset and model on your host system:

mkdir 3dunet
cd 3dunet
git clone https://github.com/neheller/kits19
cd kits19
pip3 install -r requirements.txt
python3 -m starter_code.get_imaging
cd ..

Set Up Environment

Follow these steps to set up the docker instance and preprocess the data.

Start a Container

Use docker run to start a container with the optimized Docker image we pulled earlier. Replace /path/of/3dunet with the 3dunet folder path created earlier:

docker run --name intel_3dunet --privileged -itd -v /path/to/3dunet:/root/mlperf_data/3dunet-kits --net=host --ipc=host mlperf_inference_3dunet:3.1

Login to Docker Instance

Login into a bashrc shell in the Docker instance.

docker exec -it intel_3dunet bash

Preprocess Data

If you need a proxy to access the internet, replace your host proxy with the proxy server for your environment. If no proxy is needed, you can skip this step:

export http_proxy="your host proxy"
export https_proxy="your host proxy"

Preprocess the data and download the model using the provided script:

cd code/3d-unet-99.9/pytorch-cpu/
bash process_data_model.sh 

Run the Benchmark

# 3dunet only has offline mode
bash run.sh perf # offline performance
bash run.sh acc  # offline accuracy

Get the Results

  • Check log file. Performance results are in ./output/mlperf_log_summary.txt. Verify that you see results is: valid.

  • For offline mode performance, check the field Samples per second:

  • Accuracy results are in ./output/accuracy.txt. Check the field mean =.

  • The performance result is controled by the value of "targetqps" in user.conf file. The scripts will automatically select userdefault.conf file to calculate corresponding "targetqps" according to the number of sockets on customer's platform. Customers can also manully change the value of "targetqps" in corresponding user.conf files.

Save these output log files elsewhere when each test is completed as they will be overwritten by the next test.

Get started with BERT

step-by-step

The docker container can be created either by building it using the Dockerfile or pulling the image from Dockerhub (if available).

Build & Run Docker container from Dockerfile

If you haven't already done so, build and run the Intel optimized Docker image for BERT using:

cd <THIS_REPO>/closed/Intel/code/bert-99/pytorch-cpu/docker/

bash build_bert-99_contanier.sh

Prerequisites

Use these commands to prepare the BERT dataset and model on your host system:

cd /data/mlperf_data   # or path to where you want to store the data
mkdir bert
cd bert
mkdir dataset
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json -O dataset/dev-v1.1.json
git clone https://huggingface.co/bert-large-uncased model
cd model
wget https://zenodo.org/record/4792496/files/pytorch_model.bin?download=1 -O pytorch_model.bin

Note: wget commands use IPv6 by default, if your system uses IPv4, please add -4 option into the wget command to force it to use IPv4.

Set Up Environment

Follow these steps to set up the docker instance and preprocess the data.

Start a Container

Use docker run to start a container with the optimized Docker image we pulled or built earlier. Replace /path/of/bert with the bert folder path created earlier (i.e. /data/mlperf_data/bert):

docker run --name bert_3-1 --privileged -itd --net=host --ipc=host -v /path/of/bert:/data/mlperf_data/bert <bert docker image ID>

Login to Docker Instance

Login into a bashrc shell in the Docker instance.

docker exec -it bert_3-1 bash

Convert Dataset and Model

If you need a proxy to access the internet, replace your host proxy with the proxy server for your environment. If no proxy is needed, you can skip this step:

export http_proxy="your host proxy"
export https_proxy="your host proxy"
cd code/bert-99/pytorch-cpu
export DATA_PATH=/data/mlperf_data/bert
bash convert.sh

Run the Benchmark

bash run.sh                    #offline performance
bash run.sh --accuracy         #offline accuracy
bash run_server.sh             #server performance
bash run_server.sh --accuracy  #server accuracy

Get the Results

Check the performance log file ./test_log/mlperf_log_summary.txt:

  • Verify you see results is: valid.
  • For offline mode performance, check the field Samples per second:
  • For server mode performance, check the field Scheduled samples per second:
  • The performance result is controled by the value of "targetqps" in user.conf file. The scripts will automatically select userdefault.conf file to calculate corresponding "targetqps" according to the number of sockets on customer's platform. Customers can also manully change the value of "targetqps" in corresponding user.conf files.

Check the accuracy log file ./test_log/accuracy.txt.

  • Check the field f1

Save these output log files elsewhere when each test is completed as they will be overwritten by the next test.

Get Started with ResNet50

step-by-step

The docker container can be created either by building it using the Dockerfile or pulling the image from Dockerhub (if available). Please download the Imagenet dataset on the host system before starting the container.

Download Imagenet Dataset for Calibration

Download ImageNet (50000) dataset

bash download_imagenet.sh

Build & Run Docker container from Dockerfile

If you haven't already done so, build and run the Intel optimized Docker image for ResNet50 using:

cd <THIS_REPO>/closed/Intel/code/resnet50/pytorch-cpu/docker/

bash build_resnet50_contanier.sh

docker run -v </path/to/ILSVRC2012_img_val>:/opt/workdir/code/resnet50/pytorch-cpu/ILSVRC2012_img_val -it --privileged <resnet docker image ID> /bin/bash

cd code/resnet50/pytorch-cpu

Prepare Calibration Dataset & Download Model ( Inside Container )

If you need a proxy to access the internet, replace your host proxy with the proxy server for your environment. If no proxy is needed, you can skip this step:

export http_proxy="your host proxy"
export https_proxy="your host proxy"

Prepare calibration 500 images into folders

bash prepare_calibration_dataset.sh

Download the model

bash download_model.sh

The downloaded model will be saved as resnet50-fp32-model.pth

Quantize Torchscript Model and Check Accuracy

  • Set the following paths:
export DATA_CAL_DIR=calibration_dataset
export CHECKPOINT=resnet50-fp32-model.pth
  • Generate scales and models
bash generate_torch_model.sh

The start and end parts of the model are also saved (respectively named) in models

Run Benchmark (Common for Docker & Baremetal)

export DATA_DIR=${PWD}/ILSVRC2012_img_val
export RN50_START=models/resnet50-start-int8-model.pth
export RN50_END=models/resnet50-end-int8-model.pth
export RN50_FULL=models/resnet50-full.pth

Performance

  • Offline
bash run_offline.sh <batch_size>

Note: should be 8 or 256.

  • Server
bash run_server.sh

Accuracy

  • Offline
bash run_offline_accuracy.sh <batch_size>
  • Server
bash run_server_accuracy.sh

Get the Results

Check the ./mlperf_log_summary.txt log file:

  • Verify you see results is: valid.
  • For offline mode performance, check the field Samples per second:
  • For server mode performance, check the field Scheduled samples per second:
  • The performance result is controled by the value of "targetqps" in user.conf file. The scripts will automatically select userdefault.conf file to calculate corresponding "targetqps" according to the number of sockets on customer's platform. Customers can also manully change the value of "targetqps" in corresponding user.conf files.

Check the ./offline_accuracy.txt or ./server_accuracy.txt log file:

  • Check the field accuracy

Save these output log files elsewhere when each test is completed as they will be overwritten by the next test.

Get Started with Retinanet

step-by-step

The docker container can be created either by building it using the Dockerfile or pulling the image from Dockerhub (if available). Please download the Imagenet dataset on the host system before starting the container.

Download the dataset

  • Install dependencies (python3.9 or above)
pip3 install --upgrade pip --user
pip3 install opencv-python-headless==4.5.3.56 pycocotools==2.0.2 fiftyone==0.16.5
  • Setup env vars
CUR_DIR=$(pwd)
export WORKLOAD_DATA=${CUR_DIR}/data
mkdir -p ${WORKLOAD_DATA}

export ENV_DEPS_DIR=${CUR_DIR}/retinanet-env
  • Download OpenImages (264) dataset
bash openimages_mlperf.sh --dataset-path ${WORKLOAD_DATA}/openimages

Images are downloaded to ${WORKLOAD_DATA}/openimages

  • Download Calibration images
bash openimages_calibration_mlperf.sh --dataset-path ${WORKLOAD_DATA}/openimages-calibration

Calibration dataset downloaded to ${WORKLOAD_DATA}/openimages-calibration

Download Model

wget --no-check-certificate 'https://zenodo.org/record/6617981/files/resnext50_32x4d_fpn.pth' -O 'retinanet-model.pth'
mv 'retinanet-model.pth' ${WORKLOAD_DATA}/

Note: wget commands use IPv6 by default, if your system uses IPv4, please add -4 option into the wget command to force it to use IPv4.

Build & Run Docker container from Dockerfile

If you haven't already done so, build and run the Intel optimized Docker image for Retinanet using:

cd <THIS_REPO>/closed/Intel/code/retinanet/pytorch-cpu/docker/

bash build_retinanet_contanier.sh

docker run --name intel_retinanet --privileged -itd --net=host --ipc=host -v ${WORKLOAD_DATA}:/opt/workdir/code/retinanet/pytorch-cpu/data <resnet docker image ID> 

docker exec -it intel_retinanet bash 

cd code/retinanet/pytorch-cpu/

Calibrate and generate torchscript model

Run Calibration

CUR_DIR=$(pwd)
export WORKLOAD_DATA=${CUR_DIR}/data
export CALIBRATION_DATA_DIR=${WORKLOAD_DATA}/openimages-calibration/train/data
export MODEL_CHECKPOINT=${WORKLOAD_DATA}/retinanet-model.pth
export CALIBRATION_ANNOTATIONS=${WORKLOAD_DATA}/openimages-calibration/annotations/openimages-mlperf-calibration.json
bash run_calibration.sh

Set Up Environment

If you need a proxy to access the internet, replace your host proxy with the proxy server for your environment. If no proxy is needed, you can skip this step:

export http_proxy="your host proxy"
export https_proxy="your host proxy"

Export the environment settings

source setup_env.sh

Run the Benchmark

# Run one of these performance or accuracy scripts at a time
# since the log files will be overwritten on each run

# for offline performance
bash run_offline.sh

# for server performance
bash run_server.sh

# for offline accuracy
bash run_offline_accuracy.sh

# for server accuracy
bash run_server_accuracy.sh

Get the results

Check the ./mlperf_log_summary.txt log file:

  • Verify you see results is: valid.
  • For offline mode performance, check the field Samples per second:
  • For server mode performance, check the field Scheduled samples per second:
  • The performance result is controled by the value of "targetqps" in user.conf file. The scripts will automatically select userdefault.conf file to calculate corresponding "targetqps" according to the number of sockets on customer's platform. Customers can also manully change the value of "targetqps" in corresponding user.conf files.

Check the ./accuracy.txt log file:

  • Check the field mAP

Save these output log files elsewhere when each test is completed as they will be overwritten by the next test.

Get Started with RNNT

step-by-step

If you haven't already done so, build the Intel optimized Docker image for RNNT using:

cd <THIS_REPO>/closed/Intel/code/rnnt/pytorch-cpu/docker/
bash build_rnnt-99_container.sh

Set Up Environment

Follow these steps to set up the docker instance.

Start a Container

Use docker run to start a container with the optimized Docker image we built earlier.

docker run --name intel_rnnt --privileged -itd -v /data/mlperf_data:/data/mlperf_data --net=host --ipc=host mlperf_inference_rnnt:3.1

Login to Docker Container

Get the Docker container ID and login into a bashrc shell in the Docker instance using docker exec.

docker ps -a #get container "id"
docker exec -it <id> bash
cd /opt/workdir/code/rnnt/pytorch-cpu
  • Setup env vars
export LD_LIBRARY_PATH=/opt/workdir/code/rnnt/pytorch-cpu/third_party/lib:$LD_LIBRARY_PATH

If you need a proxy to access the internet, replace your host proxy with the proxy server for your environment. If no proxy is needed, you can skip this step:

export http_proxy="your host proxy"
export https_proxy="your host proxy"

Run the Benchmark

The provided run.sh script abstracts the end-to-end process for RNNT:

STAGE STEP
0 Download model
1 Download dataset
2 Pre-process dataset
3 Calibration
4 Build model
5 Run Offline/Server accuracy & benchmark

Run run.sh with STAGE=0 to invoke all the steps requried to run the benchmark (i.e download the model & dataset, preprocess the data, calibrate and build the model):

 SKIP_BUILD=1 STAGE=0 bash run.sh

or to skip to stage 5 without previous steps: Offline/Server accuracy and benchmark:

 SKIP_BUILD=1 STAGE=5 bash run.sh

Get the Results

Check the appropriate offline or server performance log files, either ./logs/Server/performance/.../mlperf_log_summary.txt or ./logs/Offline/performance/.../mlperf_log_summary.txt:

  • Verify you see results is: valid.
  • For offline mode performance, check the field Samples per second:
  • For server mode performance, check the field Scheduled samples per second:
  • The performance result is controled by the value of "targetqps" in user.conf file. The scripts will automatically select userdefault.conf file to calculate corresponding "targetqps" according to the number of sockets on customer's platform. Customers can also manully change the value of "targetqps" in corresponding user.conf files.

Check the appropriate offline or server accuracy log file, either ./logs/Server/accuracy/.../mlperf_log_summary.txt or ./logs/Offline/accuracy/.../mlperf_log_summary.txt:

Save these output log files elsewhere when each test is completed as they will be overwritten by the next test.

Check the appropriate offline or server accuracy log file, either ./logs/Server/accuracy/.../mlperf_log_summary.txt or ./logs/Offline/accuracy/.../mlperf_log_summary.txt:

Save these output log files elsewhere when each test is completed as they will be overwritten by the next test.

Complinace Test

To run compliance test please follow https://github.com/mlcommons/inference/tree/master/compliance/nvidia or use automation script introduced in next section.




Previous MLPerf v3.0 Submission

Intel has participated in Mleprf submissions since the very beginning of the foundation of MLcommons. In December 2018 Intel published the first Mlperf training benchmark suite together with Goodle and Nvidia. So far, there have been more than 100 results were submitted on Xeon. This session will show how to run Intel MLPerf v3.0 submission with Intel optimized Docker images.

Get Started with Intel MLPerf v3.0 Submission with Intel Optimized Docker Images

Get the latest MLPerf 3.0 release

Please follow the below commands to get the latest mlperf 3.0 release.

git clone https://github.com/mlcommons/inference_results_v3.0.git
cd inference_results_v3.0
wget https://raw.githubusercontent.com/intel-ai-tce/ai-documents/mlperf_patches/AEM/mlperf/patches/0001-updates-for-3.0-submission.patch
git am 0001-updates-for-3.0-submission.patch

Intel Docker Images for MLPerf

The Intel optimized Docker images for MLPerf v3.0 can be built using the Dockerfiles.
Please refer to "Build & Run Docker container from Dockerfile" sub-section in each model section.

Example for building docker image with Dockerfile:

cd inference_results_v3.0/closed/Intel/code/resnet50/pytorch-cpu/docker/

bash build_resnet50_contanier.sh

Validated HW configuration:

System Info Configuration detail
CPU SPR
OS CentOS Stream 8
Kernel 6.1.11-1.el8.elrepo.x86_64
Memory 1024GB (16x64GB 4800MT/s [4800MT/s])
Disk 1TB NVMe

Recommmended BIOS Knobs:

BIOS Knobs Recommended Value
Hyperthreading Enabled
Turbo Boost Enabled
Core Prefetchers Hardware,Adjacent Cache,DCU Streamer,DCU IP
LLC Prefetch Disable
CPU Power and Perf Policy Performance
NUMA-based Cluster Disabled
Energy Perf Bias Performance
Energy Efficient Turbo Disabled

Please also refer to Eagle Stream Platform Performance & Power Optimization Guide for more details.

Check System Health Using Intel® System Health Inspector:

Intel® System Health Inspector (aka svr-info) is a Linux OS utility for assessing the state and health of Intel Xeon computers. It is suggested to use svr-info first to check any system configuration issue before running any benchmark. Follow the Quick Start Guide for downloading and installation. The following are several key factors effecting the model performance.

CPU Couple CPU features impact MLPerf performance via related BIOS knobs, so please double check the CPU features with your BIOS knobs. Some important CPU features are Hyperthreading, number of NUMA nodes, Prefetchers and Intel Turbo Boost.

Memory One important system configuration is balanced DIMM population, which is suggested to set as balanced to get optimized performance.
Populate as many channels per socket as possible prior to adding additional DIMMs to the channel.
It might impact the memory bandwidth if two dimm share one channel.

Please also refer to Chapter 4 in Eagle Stream Platform Performance & Power Optimization Guide for more details.

From the results of svr-info, an example of unbalanced DIMM population is shown as follows,

An exmaple of Balanced DIMM population is shown as follows,


You should also see good numbers for memory NUMA bandwidth if you also benchmark memory via svr-info.
Here are some reference numbers from a 2S SPR system.

Power We recommend the intel_pstate Frequency Driver.
For best performance, set the Frequency Governor and Power and Perf Policy to performance.
Here are related recommended power settings from svr-info.

Best Known Configurations:

sudo bash run_clean.sh

Running models:

In the following sections, we'll show you how to set up and run each of the six models:


Get Started with 3DUNET

Build & Run Docker container from Dockerfile

If you haven't already done so, build the Intel optimized Docker image for 3DUNET using:

cd inference_results_v3.0/closed/Intel/code/3d-unet-99.9/pytorch-cpu/docker
bash build_3dunet_container.sh

Prerequisites

Use these commands to prepare the 3DUNET dataset and model on your host system:

mkdir 3dunet
cd 3dunet
git clone https://github.com/neheller/kits19
cd kits19
pip3 install -r requirements.txt
python3 -m starter_code.get_imaging
cd ..

Set Up Environment

Follow these steps to set up the docker instance and preprocess the data.

Start a Container

Use docker run to start a container with the optimized Docker image we pulled earlier. Replace /path/of/3dunet with the 3dunet folder path created earlier:

docker run --name intel_3dunet --privileged -itd -v /path/to/3dunet:/root/mlperf_data/3dunet-kits --net=host --ipc=host mlperf_inference_3dunet:3.0

Login to Docker Instance

Login into a bashrc shell in the Docker instance.

docker exec -it intel_3dunet bash

Preprocess Data

If you need a proxy to access the internet, replace your host proxy with the proxy server for your environment. If no proxy is needed, you can skip this step:

export http_proxy="your host proxy"
export https_proxy="your host proxy"

Preprocess the data and download the model using the provided script:

pip install numpy==1.23.5
cd code/3d-unet-99.9/pytorch-cpu/
bash process_data_model.sh 

Run the Benchmark

# 3dunet only has offline mode
bash run.sh perf # offline performance
bash run.sh acc  # offline accuracy

Get the Results

  • Check log file. Performance results are in ./output/mlperf_log_summary.txt. Verify that you see results is: valid.

  • For offline mode performance, check the field Samples per second:

  • Accuracy results are in ./output/accuracy.txt. Check the field mean =.

  • The performance result is controled by the value of "targetqps" in usersocket.conf file. The scripts will automatically select usersocket.conf file according to the number of sockets on customer's platform. Customers can also manully change the value of "targetqps" in corresponding usersocket.conf files.

Save these output log files elsewhere when each test is completed as they will be overwritten by the next test.

Get started with BERT

The docker container can be created either by building it using the Dockerfile or pulling the image from Dockerhub (if available).

Build & Run Docker container from Dockerfile

If you haven't already done so, build and run the Intel optimized Docker image for BERT using:

cd inference_results_v3.0/closed/Intel/code/bert-99/pytorch-cpu/docker/

bash build_bert-99_contanier.sh

Prerequisites

Use these commands to prepare the BERT dataset and model on your host system:

cd /data/mlperf_data   # or path to where you want to store the data
mkdir bert
cd bert
mkdir dataset
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json -O dataset/dev-v1.1.json
git clone https://huggingface.co/bert-large-uncased model
cd model
wget https://zenodo.org/record/4792496/files/pytorch_model.bin?download=1 -O pytorch_model.bin

Set Up Environment

Follow these steps to set up the docker instance and preprocess the data.

Start a Container

Use docker run to start a container with the optimized Docker image we pulled or built earlier. Replace /path/of/bert with the bert folder path created earlier (i.e. /data/mlperf_data/bert):

docker run --name bert_3-0 --privileged -itd --net=host --ipc=host \
  -v /path/of/bert:/data/mlperf_data/bert <bert docker image ID>

Login to Docker Instance

Login into a bashrc shell in the Docker instance.

docker exec -it bert_3-0 bash

Convert Dataset and Model

If you need a proxy to access the internet, replace your host proxy with the proxy server for your environment. If no proxy is needed, you can skip this step:

export http_proxy="your host proxy"
export https_proxy="your host proxy"
cd code/bert-99/pytorch-cpu
export DATA_PATH=/data/mlperf_data/bert
bash convert.sh

Run the Benchmark

bash run.sh                    #offline performance
bash run.sh --accuracy         #offline accuracy
bash run_server.sh             #server performance
bash run_server.sh --accuracy  #server accuracy

Get the Results

Check the performance log file ./test_log/mlperf_log_summary.txt:

  • Verify you see results is: valid.
  • For offline mode performance, check the field Samples per second:
  • For server mode performance, check the field Scheduled samples per second:
  • The performance results are controled by the value of "targetqps" in usersocket.conf file. The scripts will automatically select usersocket.conf file according to the number of sockets on customer's platform. Customers can also manully change the value of "targetqps" in corresponding usersocket.conf files.

Check the accuracy log file ./test_log/accuracy.txt.

  • Check the field f1

Save these output log files elsewhere when each test is completed as they will be overwritten by the next test.


Get started with DLRM

Build & Run Docker container from Dockerfile

If you haven't already done so, build the Intel optimized Docker image for DLRM using:

# Please get compiler first.
cd inference_results_v3.0/closed/Intel/code/dlrm-99.9
wget https://registrationcenter-download.intel.com/akdlm/irc_nas/18679/l_HPCKit_p_2022.2.0.191.sh

# Build docker image
cd inference_results_v3.0/closed/Intel/code/dlrm-99.9/pytorch-cpu/docker
bash build_dlrm-99.9_container.sh

Prerequisites

Use these commands to prepare the Deep Learning Recommendation Model (DLRM) dataset and model on your host system:

cd /data/   # or path to where you want to store the data
mkdir -p /data/dlrm/model
mkdir -p /data/dlrm/terabyte_input

# download dataset
# Create a directory (such as /data/dlrm/terabyte_input) which contain:
#        day_fea_count.npz
#        terabyte_processed_test.bin
#
# Learn how to get the dataset from:
#     https://github.com/facebookresearch/dlrm
# You can also copy it using:
#     scp -r mlperf@10.112.230.156:/home/mlperf/dlrm_data/* /data/dlrm/terabyte_input
#
# download model
# Create a directory (such as /data/dlrm/model):
cd /data/dlrm/model
wget https://dlrm.s3-us-west-1.amazonaws.com/models/tb00_40M.pt -O dlrm_terabyte.pytorch

Set Up Environment

Follow these steps to set up the docker instance.

Start a Container

Use docker run to start a container with the optimized Docker image we pulled earlier. Replace /path/of/dlrm with the dlrm folder path created earlier (/data/dlrm for example):

docker run --name intel_inference_dlrm --privileged -itd --net=host --ipc=host \
  -v /path/of/dlrm:/data/mlperf_data/raw_dlrm mlperf_inference_dlrm:3.0

Login to Docker Container

Login into a bashrc shell in the Docker instance.

docker exec -it intel_inference_dlrm bash

Preprocess model and dataset

If you need a proxy to access the internet, replace your host proxy with the proxy server for your environment. If no proxy is needed, you can skip this step:

export http_proxy="your host proxy"
export https_proxy="your host proxy"
cd /opt/workdir/code/dlrm/pytorch-cpu
export MODEL=/data/mlperf_data/raw_dlrm/model
export DATASET=/data/mlperf_data/raw_dlrm/terabyte_input
export DUMP_PATH=/data/mlperf_data/dlrm
bash dump_model_dataset.sh

Run the Benchmark

export MODEL_DIR=/data/mlperf_data/dlrm
export DATA_DIR=/data/mlperf_data/dlrm

bash runcppsut                     # offline performance
bash runcppsut accuracy               # offline accuracy
bash runcppsut performance server  # server performance
bash runcppsut accuracy server     # server accuracy

Get the Results

Check the appropriate offline or server performance log file, either ./output/PerformanceOnly/Offline/mlperf_log_summary.txt or ./output/PerformanceOnly/Server/mlperf_log_summary.txt:

  • Verify you see results is: valid.
  • For offline mode performance, check the field Samples per second:
  • For server mode performance, check the field Scheduled samples per second:
  • The performance result is controled by the value of "targetqps" in usersocket.conf file. The scripts will automatically select usersocket.conf file according to the number of sockets on customer's platform. Customers can also manully change the value of "targetqps" in corresponding usersocket.conf files.

Check the appropriate offline or server accuracy log file, either ./output/AccuracyOnly/Offline/accuracy.txt or ./output/AccuracyOnly/Server/accuracy.txt:

  • Check the field AUC

Save these output log files elsewhere when each test is completed as they will be overwritten by the next test.


Get Started with ResNet50

The docker container can be created either by building it using the Dockerfile or pulling the image from Dockerhub (if available). Please download the Imagenet dataset on the host system before starting the container.

Download Imagenet Dataset for Calibration

Download ImageNet (50000) dataset

bash download_imagenet.sh

Build & Run Docker container from Dockerfile

If you haven't already done so, build and run the Intel optimized Docker image for ResNet50 using:

cd inference_results_v3.0/closed/Intel/code/resnet50/pytorch-cpu/docker/

bash build_resnet50_contanier.sh

docker run -v </path/to/ILSVRC2012_img_val>:/opt/workdir/code/resnet50/pytorch-cpu/ILSVRC2012_img_val -it --privileged <resnet docker image ID> /bin/bash

cd code/resnet50/pytorch-cpu

Prepare Calibration Dataset & Download Model ( Inside Container )

If you need a proxy to access the internet, replace your host proxy with the proxy server for your environment. If no proxy is needed, you can skip this step:

export http_proxy="your host proxy"
export https_proxy="your host proxy"

Prepare calibration 500 images into folders

cd /opt/workdir/code/resnet50/pytorch-cpu
bash prepare_calibration_dataset.sh

Download the model

bash download_model.sh

The downloaded model will be saved as resnet50-fp32-model.pth

Quantize Torchscript Model and Check Accuracy

  • Set the following paths:
export DATA_CAL_DIR=calibration_dataset
export CHECKPOINT=resnet50-fp32-model.pth
  • Generate scales and models
bash generate_torch_model.sh

The start and end parts of the model are also saved (respectively named) in models

Run Benchmark (Common for Docker & Baremetal)

export DATA_DIR=${PWD}/ILSVRC2012_img_val
export RN50_START=models/resnet50-start-int8-model.pth
export RN50_END=models/resnet50-end-int8-model.pth
export RN50_FULL=models/resnet50-full.pth

Performance

  • Offline
bash run_offline.sh <batch_size>
  • Server
bash run_server.sh

Accuracy

  • Offline
bash run_offline_accuracy.sh <batch_size>
  • Server
bash run_server_accuracy.sh

Get the Results

Check the ./mlperf_log_summary.txt log file:

  • Verify you see results is: valid.
  • For offline mode performance, check the field Samples per second:
  • For server mode performance, check the field Scheduled samples per second:
  • The performance result is controled by the value of "targetqps" in ./src/usersocket.conf file. The scripts will automatically select usersocket.conf file according to the number of sockets on customer's platform. Customers can also manully change the value of "targetqps" in corresponding usersocket.conf files.

Check the ./offline_accuracy.txt or ./server_accuracy.txt log file:

  • Check the field accuracy

Save these output log files elsewhere when each test is completed as they will be overwritten by the next test.


Get Started with Retinanet

The docker container can be created either by building it using the Dockerfile or pulling the image from Dockerhub (if available). Please download the Imagenet dataset on the host system before starting the container.

Download the dataset

  • Install dependencies (python3.9 or above)
pip3 install --upgrade pip --user
pip3 install opencv-python-headless==4.5.3.56 pycocotools==clear2.0.2 fiftyone==0.16.5
  • Setup env vars
CUR_DIR=$(pwd)
export WORKLOAD_DATA=${CUR_DIR}/data
mkdir -p ${WORKLOAD_DATA}

export ENV_DEPS_DIR=${CUR_DIR}/retinanet-env
  • Download OpenImages (264) dataset
bash openimages_mlperf.sh --dataset-path ${WORKLOAD_DATA}/openimages

Images are downloaded to ${WORKLOAD_DATA}/openimages

  • Download Calibration images
bash openimages_calibration_mlperf.sh --dataset-path ${WORKLOAD_DATA}/openimages-calibration

Calibration dataset downloaded to ${WORKLOAD_DATA}/openimages-calibration

Note: If you meet any obstacles on downloading the dataset, please try again in the docker container to be launched after [Build & Run Docker container from Dockerfile](Build & Run Docker container from Dockerfile).

Download Model

wget --no-check-certificate 'https://zenodo.org/record/6617981/files/resnext50_32x4d_fpn.pth' -O 'retinanet-model.pth'
mv 'retinanet-model.pth' ${WORKLOAD_DATA}/

Build & Run Docker container from Dockerfile

If you haven't already done so, build and run the Intel optimized Docker image for Retinanet using:

cd inference_results_v3.0/closed/Intel/code/retinanet/pytorch-cpu/docker/

bash build_retinanet_contanier.sh

docker run --name intel_retinanet --privileged -itd --net=host --ipc=host -v ${WORKLOAD_DATA}:/opt/workdir/code/retinanet/pytorch-cpu/data <retinanet docker image ID> 

docker exec -it intel_retinanet bash 

cd code/retinanet/pytorch-cpu/

Calibrate and generate torchscript model

If you need a proxy to access the internet, replace your host proxy with the proxy server for your environment. If no proxy is needed, you can skip this step:

export http_proxy="your host proxy"
export https_proxy="your host proxy"

Run Calibration

CUR_DIR=$(pwd)
export WORKLOAD_DATA=${CUR_DIR}/data
export CALIBRATION_DATA_DIR=${WORKLOAD_DATA}/openimages-calibration/train/data
export MODEL_CHECKPOINT=${WORKLOAD_DATA}/retinanet-model.pth
export CALIBRATION_ANNOTATIONS=${WORKLOAD_DATA}/openimages-calibration/annotations/openimages-mlperf-calibration.json

cd /opt/workdir/code/retinanet/pytorch-cpu/retinanet-env/vision
git checkout 8e078971b8aebdeb1746fea58851e3754f103053
python setup.py install && python setup.py develop

cd /opt/workdir/code/retinanet/pytorch-cpu
bash run_calibration.sh

Set Up Environment

Export the environment settings

source setup_env.sh

Run the Benchmark

# Run one of these performance or accuracy scripts at a time
# since the log files will be overwritten on each run

# for offline performance
bash run_offline.sh

# for server performance
bash run_server.sh

# for offline accuracy
bash run_offline_accuracy.sh

# for server accuracy
bash run_server_accuracy.sh

Get the results

Check the ./mlperf_log_summary.txt log file:

  • Verify you see results is: valid.
  • For offline mode performance, check the field Samples per second:
  • For server mode performance, check the field Scheduled samples per second:
  • The performance result is controled by the value of "targetqps" in usersocket.conf file. The scripts will automatically select usersocket.conf file according to the number of sockets on customer's platform. Customers can also manully change the value of "targetqps" in corresponding usersocket.conf files.

Check the ./accuracy.txt log file:

  • Check the field mAP

Save these output log files elsewhere when each test is completed as they will be overwritten by the next test.

Get Started with RNNT

Build & Run Docker container from Dockerfile

If you haven't already done so, build the Intel optimized Docker image for RNNT using:

cd inference_results_v3.0/closed/Intel/code/rnnt/pytorch-cpu/docker/
bash build_rnnt-99_container.sh

Set Up Environment

Follow these steps to set up the docker instance.

Start a Container

Use docker run to start a container with the optimized Docker image we built earlier.

docker run --name intel_rnnt --privileged -itd -v /data/mlperf_data:/data/mlperf_data \
--net=host --ipc=host mlperf_inference_rnnt:3.0

Login to Docker Container

Get the Docker container ID and login into a bashrc shell in the Docker instance using docker exec.

docker ps -a #get container "id"
docker exec -it <id> bash
cd /opt/workdir/code/rnnt/pytorch-cpu
  • Setup env vars
export LD_LIBRARY_PATH=/opt/workdir/code/rnnt/pytorch-cpu/third_party/lib:$LD_LIBRARY_PATH

If you need a proxy to access the internet, replace your host proxy with the proxy server for your environment. If no proxy is needed, you can skip this step:

export http_proxy="your host proxy"
export https_proxy="your host proxy"

Run the Benchmark

The provided run.sh script abstracts the end-to-end process for RNNT:

STAGE STEP
0 Download model
1 Download dataset
2 Pre-process dataset
3 Calibration
4 Build model
5 Run Offline/Server accuracy & benchmark

Run run.sh with STAGE=0 to invoke all the steps requried to run the benchmark (i.e download the model & dataset, preprocess the data, calibrate and build the model):

 SKIP_BUILD=1 STAGE=0 bash run.sh

or to skip to stage 5 without previous steps: Offline/Server accuracy and benchmark:

 SKIP_BUILD=1 STAGE=5 bash run.sh

Get the Results

Check the appropriate offline or server performance log files, either ./logs/Server/performance/.../mlperf_log_summary.txt or ./logs/Offline/performance/.../mlperf_log_summary.txt:

  • Verify you see results is: valid.
  • For offline mode performance, check the field Samples per second:
  • For server mode performance, check the field Scheduled samples per second:
  • The performance result is controled by the value of "targetqps" in ./configs/usersocket.conf file. The scripts will automatically select usersocket.conf file according to the number of sockets on customer's platform. Customers can also manully change the value of "targetqps" in corresponding usersocket.conf files.

Check the appropriate offline or server accuracy log file, either ./logs/Server/accuracy/.../mlperf_log_summary.txt or ./logs/Offline/accuracy/.../mlperf_log_summary.txt:

Save these output log files elsewhere when each test is completed as they will be overwritten by the next test.

Get Started with Intel MLPerf v3.1 Submission with Intel Optimized Docker Images

MLPerf is a benchmark for measuring the performance of machine learning systems. It provides a set of performance metrics for a variety of machine learning tasks, including image classification, object detection, machine translation, and others. The benchmark is representative of real-world workloads and as a fair and useful way to compare the performance of different machine learning systems.

In this document, we'll show how to use the publicly accessible codes and scritps on GitHub, which was published by Mlcommons, to run Intel MLPerf v3.1 submission with Intel optimized Docker images. The following contents will refer to this GitHub repository as .

Intel Docker Images for MLPerf

The Intel optimized Docker images for MLPerf v3.1 can be built using the Dockerfiles. Example for building docker image with Dockerfile:

# Get the mlperf v3.1 workloads scritps from GitHub
git clone https://github.com/mlcommons/inference_results_v3.1.git

cd <THIS_REPO>/closed/Intel/code/resnet50/pytorch-cpu/docker/

bash build_resnet50_contanier.sh

HW configuration:

System Info Configuration detail
CPU SPR
OS CentOS Stream 8
Kernel 6.1.11-1.el8.elrepo.x86_64
Memory 1024GB (16x64GB 4800MT/s [4800MT/s])
Disk 1TB NVMe

Recommmended BIOS Knobs:

BIOS Knobs Recommended Value
Hyperthreading Enabled
Turbo Boost Enabled
Core Prefetchers Hardware,Adjacent Cache,DCU Streamer,DCU IP
LLC Prefetch Disable
CPU Power and Perf Policy Performance
NUMA-based Cluster Disabled
Energy Perf Bias Performance
Energy Efficient Turbo Disabled

Please also refer to Eagle Stream Platform Performance & Power Optimization Guide for more details.

Check System Health Using Intel® System Health Inspector:

Intel® System Health Inspector (aka svr-info) is a Linux OS utility for assessing the state and health of Intel Xeon computers. It is suggested to use svr-info first to check any system configuration issue before running any benchmark. Follow the Quick Start Guide for downloading and installation. The following are several key factors effecting the model performance.

 

CPU Couple CPU features impact MLPerf performance via related BIOS knobs, so please double check the CPU features with your BIOS knobs. Some important CPU features are Hyperthreading, number of NUMA nodes, Prefetchers and Intel Turbo Boost.

 

 

Memory One important system configuration is balanced DIMM population, which is suggested to set as balanced to get optimized performance.
Populate as many channels per socket as possible prior to adding additional DIMMs to the channel.
It might impact the memory bandwidth if two dimm share one channel.

Please also refer to Chapter 4 in Eagle Stream Platform Performance & Power Optimization Guide for more details.

 

From the results of svr-info, an example of unbalanced DIMM population is shown as follows,

An exmaple of Balanced DIMM population is shown as follows,

You should also see good numbers for memory NUMA bandwidth if you also benchmark memory via svr-info.
Here are some reference numbers from a 2S SPR system.

 

 

 

Power We recommend the intel_pstate Frequency Driver.
For best performance, set the Frequency Governor and Power and Perf Policy to performance.
Here are related recommended power settings from svr-info.

 

Best Known Configurations:

sudo bash run_clean.sh

Benchmarking using automation scripts

For your convinience, we prepare a set of automation scritps to help you download data, create docker, do data and model preprocessing, run accuracy, performance and compliance test in a batch. Please refer to ./automation/README.md for details about the usage. Example on for using automation scripts:

# Get the mlperf v3.1 workloads scritps from GitHub
git clone https://github.com/mlcommons/inference_results_v3.1.git

# Go to directory of automation scripts
cd <THIS_REPO>/closed/Intel/code/automation/

# Download dataset
bash download_dataset.sh <model> <location>
# <model> can be resnet50, retinanet, rnnt, 3d-unet-99.9, bert-99, gptj-99, or dlrm2-99.9
# <location> is where you save the data, which can be /data/mlperf_data

# Test model performance
PerformanceOnly="True" bash run.sh <model> <location>

# Test model Auccuracy
# Suppose you have done running the performance test workload, you can skip launching docker container and processing the data
Skip_docker_build="True" Skip_data_proprocess="True" AccuracyOnly="True" bash run.sh <model> <location>

For more details, please refer to the instructions in https://github.com/mlcommons/inferenceresultsv3.1/blob/main/closed/Intel/code/automation/README.md.

If you prefer to understand what the automation scripts do for you, we also provide instructions on how to run model performance/accuracy benchmarking step-by-step in the following sections.

Running models step-by-step

In the following sections, we'll show you how to set up and run each of the seven models:


Note: All the codes and scripts are publicly accissible and can be downloaded from GitHub. The following sessions will refer this GitHub repository as .

Get started with DLRM2

If you haven't already done so, build the Intel optimized Docker image for DLRM using:

cd <THIS_REPO>/closed/Intel/code/dlrm-99.9/pytorch-cpu-int8/docker
# Please firstly refer to the prerequisite file in the current directory to download the compiler before building the Docker image. 
bash build_dlrm-99.9_container.sh

Prerequisites

Use these commands to prepare the Deep Learning Recommendation Model (DLRM) dataset and model on your host system:

cd /data/   # or path to where you want to store the data
mkdir -p /data/mlperf_data/dlrm_2/model/bf16
mkdir -p /data/mlperf_data/dlrm_2/data_npy

# Prepare DLRM dataset
# Create a directory (such as /data/mlperf_data/dlrm_2/data_npy) which contain:
#        day_23_dense.npy
#     day_23_sparse_multi_hot.npz
#     day_23_labels.npy
#
# Learn how to get the dataset from:
#     https://github.com/mlcommons/inference/tree/master/recommendation/dlrm_v2/pytorch

# Prepare pre-trained DLRM model
cd /data/mlperf_data/dlrm_2/model/bf16
wget https://cloud.mlcommons.org/index.php/s/XzfSeLgW8FYfR3S/download
unzip weights.zip

cd <THIS_REPO>/closed/Intel/code/dlrm-99.9/pytorch-cpu/
export MODEL_DIR=/data/mlperf_data/dlrm_2/model/bf16


# dump model from snapshot to torch
bash run_dump_torch_model.sh

Note: wget commands use IPv6 by default, if your system uses IPv4, please add -4 option into the wget command to force it to use IPv4.

Set Up Environment

Follow these steps to set up the docker instance.

Start a Container

Use docker run to start a container with the optimized Docker image we pulled earlier. Replace /path/of/dlrm with the dlrm folder path created earlier (/data/dlrm for example):

docker run --name intel_inference_dlrm_int8 --privileged -itd --net=host --ipc=host \
  -v /path/of/dlrm:/data/dlrm_2_dataset mlperf_inference_dlrm2:3.1

Login to Docker Container

Login into a bashrc shell in the Docker instance.

docker exec -it intel_inference_dlrm_int8 bash

Preprocess model and dataset

If you need a proxy to access the internet, replace your host proxy with the proxy server for your environment. If no proxy is needed, you can skip this step:

export http_proxy="your host proxy"
export https_proxy="your host proxy"

Calibrate and dump int8 model

cd /opt/workdir/code/dlrm2-99.9/pytorch-cpu-int8
bash ./run_calibration.sh

Note: runcalibration script does not need to finish, once you see rocauc output you can ctrl+z to stop

Export model and dataset directory

# export model directory to saved model path
export MODEL_DIR=/data/mlperf_data/dlrm_2/model/bf16

# export dataset directory to saved dataset path where .npy .npz are stored.
export DATA_DIR=/data/mlperf_data/dlrm_2/data_npy

Run the Benchmark

# offline performance
source setup_env_offline.sh
bash run_main.sh offline int8                   

# offline accuracy
source setup_env_offline.sh
bash run_main.sh offline accuracy int8              

# server performance
source setup_env_server.sh
bash run_main.sh server int8

# server accuracy
source setup_env_server.sh   
bash run_main.sh server accuracy int8    

Get Started with GPT-J

Download and Prepare Dataset

export WORKLOAD_DATA=/data/mlperf_data/gpt-j
mkdir -p ${WORKLOAD_DATA}
  • Download cnn-dailymail calibration set
cd <THIS_REPO>/closed/Intel/code/gptj-99/pytorch-cpu/
python download-calibration-dataset.py --calibration-list-file calibration-list.txt --output-dir ${WORKLOAD_DATA}/calibration-data
  • Download cnn-dailymail validation set
python download-dataset.py --split validation --output-dir ${WORKLOAD_DATA}/validation-data

Download and prepare model

  • Get finetuned checkpoint
CHECKPOINT_DIR=${WORKLOAD_DATA}/gpt-j-checkpoint
wget https://cloud.mlcommons.org/index.php/s/QAZ2oM94MkFtbQx/download -O gpt-j-checkpoint.zip
unzip gpt-j-checkpoint.zip
mv gpt-j/checkpoint-final/ ${CHECKPOINT_DIR}

Note: wget commands use IPv6 by default, if your system uses IPv4, please add -4 option into the wget command to force it to use IPv4.

Build & Run Docker container from Dockerfile

If you haven't already done so, build the Intel optimized Docker image for GPT-J using:

cd <THIS_REPO>/closed/Intel/code/gptj-99/pytorch-cpu/docker
bash build_gpt-j_container.sh

docker run --name intel_gptj --privileged -itd --net=host --ipc=host -v ${WORKLOAD_DATA}:/opt/workdir/code/gptj-99/pytorch-cpu/data mlperf_inference_gptj:3.1
docker exec -it intel_gptj bash

cd code/gptj-99/pytorch-cpu

Generate quantized INT8 model

source setup_env.sh
bash run_quantization.sh

Run Benchmarks

  • Offline (Performance)
bash run_offline.sh
  • Offline (Accuracy)
bash run_offline_accuracy.sh
  • Server (Performance)
bash run_server.sh
  • Server (Accuracy)
bash run_server_accuracy.sh

Get Started with 3DUNET

If you haven't already done so, build the Intel optimized Docker image for 3DUNET using:

cd <THIS_REPO>/closed/Intel/code/3d-unet-99.9/pytorch-cpu/docker
bash build_3dunet_container.sh

Prerequisites

Use these commands to prepare the 3DUNET dataset and model on your host system:

mkdir 3dunet
cd 3dunet
git clone https://github.com/neheller/kits19
cd kits19
pip3 install -r requirements.txt
python3 -m starter_code.get_imaging
cd ..

Set Up Environment

Follow these steps to set up the docker instance and preprocess the data.

Start a Container

Use docker run to start a container with the optimized Docker image we pulled earlier. Replace /path/of/3dunet with the 3dunet folder path created earlier:

docker run --name intel_3dunet --privileged -itd -v /path/to/3dunet:/root/mlperf_data/3dunet-kits --net=host --ipc=host mlperf_inference_3dunet:3.1

Login to Docker Instance

Login into a bashrc shell in the Docker instance.

docker exec -it intel_3dunet bash

Preprocess Data

If you need a proxy to access the internet, replace your host proxy with the proxy server for your environment. If no proxy is needed, you can skip this step:

export http_proxy="your host proxy"
export https_proxy="your host proxy"

Preprocess the data and download the model using the provided script:

cd code/3d-unet-99.9/pytorch-cpu/
bash process_data_model.sh 

Run the Benchmark

# 3dunet only has offline mode
bash run.sh perf # offline performance
bash run.sh acc  # offline accuracy

Get the Results

  • Check log file. Performance results are in ./output/mlperf_log_summary.txt. Verify that you see results is: valid.

  • For offline mode performance, check the field Samples per second:

  • Accuracy results are in ./output/accuracy.txt. Check the field mean =.

  • The performance result is controled by the value of "targetqps" in user.conf file. The scripts will automatically select userdefault.conf file to calculate corresponding "targetqps" according to the number of sockets on customer's platform. Customers can also manully change the value of "targetqps" in corresponding user.conf files.

Save these output log files elsewhere when each test is completed as they will be overwritten by the next test.

Get started with BERT

The docker container can be created either by building it using the Dockerfile or pulling the image from Dockerhub (if available).

Build & Run Docker container from Dockerfile

If you haven't already done so, build and run the Intel optimized Docker image for BERT using:

cd <THIS_REPO>/closed/Intel/code/bert-99/pytorch-cpu/docker/

bash build_bert-99_contanier.sh

Prerequisites

Use these commands to prepare the BERT dataset and model on your host system:

cd /data/mlperf_data   # or path to where you want to store the data
mkdir bert
cd bert
mkdir dataset
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json -O dataset/dev-v1.1.json
git clone https://huggingface.co/bert-large-uncased model
cd model
wget https://zenodo.org/record/4792496/files/pytorch_model.bin?download=1 -O pytorch_model.bin

Note: wget commands use IPv6 by default, if your system uses IPv4, please add -4 option into the wget command to force it to use IPv4.

Set Up Environment

Follow these steps to set up the docker instance and preprocess the data.

Start a Container

Use docker run to start a container with the optimized Docker image we pulled or built earlier. Replace /path/of/bert with the bert folder path created earlier (i.e. /data/mlperf_data/bert):

docker run --name bert_3-1 --privileged -itd --net=host --ipc=host -v /path/of/bert:/data/mlperf_data/bert <bert docker image ID>

Login to Docker Instance

Login into a bashrc shell in the Docker instance.

docker exec -it bert_3-1 bash

Convert Dataset and Model

If you need a proxy to access the internet, replace your host proxy with the proxy server for your environment. If no proxy is needed, you can skip this step:

export http_proxy="your host proxy"
export https_proxy="your host proxy"
cd code/bert-99/pytorch-cpu
export DATA_PATH=/data/mlperf_data/bert
bash convert.sh

Run the Benchmark

bash run.sh                    #offline performance
bash run.sh --accuracy         #offline accuracy
bash run_server.sh             #server performance
bash run_server.sh --accuracy  #server accuracy

Get the Results

Check the performance log file ./test_log/mlperf_log_summary.txt:

  • Verify you see results is: valid.
  • For offline mode performance, check the field Samples per second:
  • For server mode performance, check the field Scheduled samples per second:
  • The performance result is controled by the value of "targetqps" in user.conf file. The scripts will automatically select userdefault.conf file to calculate corresponding "targetqps" according to the number of sockets on customer's platform. Customers can also manully change the value of "targetqps" in corresponding user.conf files.

Check the accuracy log file ./test_log/accuracy.txt.

  • Check the field f1

Save these output log files elsewhere when each test is completed as they will be overwritten by the next test.


Get Started with ResNet50

The docker container can be created either by building it using the Dockerfile or pulling the image from Dockerhub (if available). Please download the Imagenet dataset on the host system before starting the container.

Download Imagenet Dataset for Calibration

Download ImageNet (50000) dataset

bash download_imagenet.sh

Build & Run Docker container from Dockerfile

If you haven't already done so, build and run the Intel optimized Docker image for ResNet50 using:

cd <THIS_REPO>/closed/Intel/code/resnet50/pytorch-cpu/docker/

bash build_resnet50_contanier.sh

docker run -v </path/to/ILSVRC2012_img_val>:/opt/workdir/code/resnet50/pytorch-cpu/ILSVRC2012_img_val -it --privileged <resnet docker image ID> /bin/bash

cd code/resnet50/pytorch-cpu

Prepare Calibration Dataset & Download Model ( Inside Container )

If you need a proxy to access the internet, replace your host proxy with the proxy server for your environment. If no proxy is needed, you can skip this step:

export http_proxy="your host proxy"
export https_proxy="your host proxy"

Prepare calibration 500 images into folders

bash prepare_calibration_dataset.sh

Download the model

bash download_model.sh

The downloaded model will be saved as resnet50-fp32-model.pth

Quantize Torchscript Model and Check Accuracy

  • Set the following paths:
export DATA_CAL_DIR=calibration_dataset
export CHECKPOINT=resnet50-fp32-model.pth
  • Generate scales and models
bash generate_torch_model.sh

The start and end parts of the model are also saved (respectively named) in models

Run Benchmark (Common for Docker & Baremetal)

export DATA_DIR=${PWD}/ILSVRC2012_img_val
export RN50_START=models/resnet50-start-int8-model.pth
export RN50_END=models/resnet50-end-int8-model.pth
export RN50_FULL=models/resnet50-full.pth

Performance

  • Offline
bash run_offline.sh <batch_size>

Note: should be 8 or 256.

  • Server
bash run_server.sh

Accuracy

  • Offline
bash run_offline_accuracy.sh <batch_size>
  • Server
bash run_server_accuracy.sh

Get the Results

Check the ./mlperf_log_summary.txt log file:

  • Verify you see results is: valid.
  • For offline mode performance, check the field Samples per second:
  • For server mode performance, check the field Scheduled samples per second:
  • The performance result is controled by the value of "targetqps" in user.conf file. The scripts will automatically select userdefault.conf file to calculate corresponding "targetqps" according to the number of sockets on customer's platform. Customers can also manully change the value of "targetqps" in corresponding user.conf files.

Check the ./offline_accuracy.txt or ./server_accuracy.txt log file:

  • Check the field accuracy

Save these output log files elsewhere when each test is completed as they will be overwritten by the next test.


Get Started with Retinanet

The docker container can be created either by building it using the Dockerfile or pulling the image from Dockerhub (if available). Please download the Imagenet dataset on the host system before starting the container.

Download the dataset

  • Install dependencies (python3.9 or above)
pip3 install --upgrade pip --user
pip3 install opencv-python-headless==4.5.3.56 pycocotools==2.0.2 fiftyone==0.16.5
  • Setup env vars
CUR_DIR=$(pwd)
export WORKLOAD_DATA=${CUR_DIR}/data
mkdir -p ${WORKLOAD_DATA}

export ENV_DEPS_DIR=${CUR_DIR}/retinanet-env
  • Download OpenImages (264) dataset
bash openimages_mlperf.sh --dataset-path ${WORKLOAD_DATA}/openimages

Images are downloaded to ${WORKLOAD_DATA}/openimages

  • Download Calibration images
bash openimages_calibration_mlperf.sh --dataset-path ${WORKLOAD_DATA}/openimages-calibration

Calibration dataset downloaded to ${WORKLOAD_DATA}/openimages-calibration

Download Model

wget --no-check-certificate 'https://zenodo.org/record/6617981/files/resnext50_32x4d_fpn.pth' -O 'retinanet-model.pth'
mv 'retinanet-model.pth' ${WORKLOAD_DATA}/

Note: wget commands use IPv6 by default, if your system uses IPv4, please add -4 option into the wget command to force it to use IPv4.

Build & Run Docker container from Dockerfile

If you haven't already done so, build and run the Intel optimized Docker image for Retinanet using:

cd <THIS_REPO>/closed/Intel/code/retinanet/pytorch-cpu/docker/

bash build_retinanet_contanier.sh

docker run --name intel_retinanet --privileged -itd --net=host --ipc=host -v ${WORKLOAD_DATA}:/opt/workdir/code/retinanet/pytorch-cpu/data <resnet docker image ID> 

docker exec -it intel_retinanet bash 

cd code/retinanet/pytorch-cpu/

Calibrate and generate torchscript model

Run Calibration

CUR_DIR=$(pwd)
export WORKLOAD_DATA=${CUR_DIR}/data
export CALIBRATION_DATA_DIR=${WORKLOAD_DATA}/openimages-calibration/train/data
export MODEL_CHECKPOINT=${WORKLOAD_DATA}/retinanet-model.pth
export CALIBRATION_ANNOTATIONS=${WORKLOAD_DATA}/openimages-calibration/annotations/openimages-mlperf-calibration.json
bash run_calibration.sh

Set Up Environment

If you need a proxy to access the internet, replace your host proxy with the proxy server for your environment. If no proxy is needed, you can skip this step:

export http_proxy="your host proxy"
export https_proxy="your host proxy"

Export the environment settings

source setup_env.sh

Run the Benchmark

# Run one of these performance or accuracy scripts at a time
# since the log files will be overwritten on each run

# for offline performance
bash run_offline.sh

# for server performance
bash run_server.sh

# for offline accuracy
bash run_offline_accuracy.sh

# for server accuracy
bash run_server_accuracy.sh

Get the results

Check the ./mlperf_log_summary.txt log file:

  • Verify you see results is: valid.
  • For offline mode performance, check the field Samples per second:
  • For server mode performance, check the field Scheduled samples per second:
  • The performance result is controled by the value of "targetqps" in user.conf file. The scripts will automatically select userdefault.conf file to calculate corresponding "targetqps" according to the number of sockets on customer's platform. Customers can also manully change the value of "targetqps" in corresponding user.conf files.

Check the ./accuracy.txt log file:

  • Check the field mAP

Save these output log files elsewhere when each test is completed as they will be overwritten by the next test.

Get Started with RNNT

If you haven't already done so, build the Intel optimized Docker image for RNNT using:

cd <THIS_REPO>/closed/Intel/code/rnnt/pytorch-cpu/docker/
bash build_rnnt-99_container.sh

Set Up Environment

Follow these steps to set up the docker instance.

Start a Container

Use docker run to start a container with the optimized Docker image we built earlier.

docker run --name intel_rnnt --privileged -itd -v /data/mlperf_data:/data/mlperf_data --net=host --ipc=host mlperf_inference_rnnt:3.1

Login to Docker Container

Get the Docker container ID and login into a bashrc shell in the Docker instance using docker exec.

docker ps -a #get container "id"
docker exec -it <id> bash
cd /opt/workdir/code/rnnt/pytorch-cpu
  • Setup env vars
export LD_LIBRARY_PATH=/opt/workdir/code/rnnt/pytorch-cpu/third_party/lib:$LD_LIBRARY_PATH

If you need a proxy to access the internet, replace your host proxy with the proxy server for your environment. If no proxy is needed, you can skip this step:

export http_proxy="your host proxy"
export https_proxy="your host proxy"

Run the Benchmark

The provided run.sh script abstracts the end-to-end process for RNNT:

STAGE STEP
0 Download model
1 Download dataset
2 Pre-process dataset
3 Calibration
4 Build model
5 Run Offline/Server accuracy & benchmark

Run run.sh with STAGE=0 to invoke all the steps requried to run the benchmark (i.e download the model & dataset, preprocess the data, calibrate and build the model):

 SKIP_BUILD=1 STAGE=0 bash run.sh

or to skip to stage 5 without previous steps: Offline/Server accuracy and benchmark:

 SKIP_BUILD=1 STAGE=5 bash run.sh

Get the Results

Check the appropriate offline or server performance log files, either ./logs/Server/performance/.../mlperf_log_summary.txt or ./logs/Offline/performance/.../mlperf_log_summary.txt:

  • Verify you see results is: valid.
  • For offline mode performance, check the field Samples per second:
  • For server mode performance, check the field Scheduled samples per second:
  • The performance result is controled by the value of "targetqps" in user.conf file. The scripts will automatically select userdefault.conf file to calculate corresponding "targetqps" according to the number of sockets on customer's platform. Customers can also manully change the value of "targetqps" in corresponding user.conf files.

Check the appropriate offline or server accuracy log file, either ./logs/Server/accuracy/.../mlperf_log_summary.txt or ./logs/Offline/accuracy/.../mlperf_log_summary.txt:

Save these output log files elsewhere when each test is completed as they will be overwritten by the next test.

Check the appropriate offline or server accuracy log file, either ./logs/Server/accuracy/.../mlperf_log_summary.txt or ./logs/Offline/accuracy/.../mlperf_log_summary.txt:

Save these output log files elsewhere when each test is completed as they will be overwritten by the next test.

Complinace Test

To run compliance test please follow https://github.com/mlcommons/inference/tree/master/compliance/nvidia or use automation script introduced in next section.


 


Previous MLPerf v3.0 Submission

Intel has participated in Mleprf submissions since the very beginning of the foundation of MLcommons. In December 2018 Intel published the first Mlperf training benchmark suite together with Goodle and Nvidia. So far, there have been more than 100 results were submitted on Xeon. This session will show how to run Intel MLPerf v3.0 submission with Intel optimized Docker images.

 

Get Started with Intel MLPerf v3.0 Submission with Intel Optimized Docker Images

 

Get the latest MLPerf 3.0 release

Please follow the below commands to get the latest mlperf 3.0 release.

git clone https://github.com/mlcommons/inference_results_v3.0.git
cd inference_results_v3.0
wget https://raw.githubusercontent.com/intel-ai-tce/ai-documents/mlperf_patches/AEM/mlperf/patches/0001-updates-for-3.0-submission.patch
git am 0001-updates-for-3.0-submission.patch

Intel Docker Images for MLPerf

The Intel optimized Docker images for MLPerf v3.0 can be built using the Dockerfiles.
Please refer to "Build & Run Docker container from Dockerfile" sub-section in each model section.

Example for building docker image with Dockerfile:

cd inference_results_v3.0/closed/Intel/code/resnet50/pytorch-cpu/docker/

bash build_resnet50_contanier.sh

Validated HW configuration:

System Info Configuration detail
CPU SPR
OS CentOS Stream 8
Kernel 6.1.11-1.el8.elrepo.x86_64
Memory 1024GB (16x64GB 4800MT/s [4800MT/s])
Disk 1TB NVMe

Recommmended BIOS Knobs:

BIOS Knobs Recommended Value
Hyperthreading Enabled
Turbo Boost Enabled
Core Prefetchers Hardware,Adjacent Cache,DCU Streamer,DCU IP
LLC Prefetch Disable
CPU Power and Perf Policy Performance
NUMA-based Cluster Disabled
Energy Perf Bias Performance
Energy Efficient Turbo Disabled

Please also refer to Eagle Stream Platform Performance & Power Optimization Guide for more details.

Check System Health Using Intel® System Health Inspector:

Intel® System Health Inspector (aka svr-info) is a Linux OS utility for assessing the state and health of Intel Xeon computers. It is suggested to use svr-info first to check any system configuration issue before running any benchmark. Follow the Quick Start Guide for downloading and installation. The following are several key factors effecting the model performance.

 

CPU Couple CPU features impact MLPerf performance via related BIOS knobs, so please double check the CPU features with your BIOS knobs. Some important CPU features are Hyperthreading, number of NUMA nodes, Prefetchers and Intel Turbo Boost.

 

 

Memory One important system configuration is balanced DIMM population, which is suggested to set as balanced to get optimized performance.
Populate as many channels per socket as possible prior to adding additional DIMMs to the channel.
It might impact the memory bandwidth if two dimm share one channel.

Please also refer to Chapter 4 in Eagle Stream Platform Performance & Power Optimization Guide for more details.

 

From the results of svr-info, an example of unbalanced DIMM population is shown as follows,

An exmaple of Balanced DIMM population is shown as follows,

You should also see good numbers for memory NUMA bandwidth if you also benchmark memory via svr-info.
Here are some reference numbers from a 2S SPR system.

 

 

 

Power We recommend the intel_pstate Frequency Driver.
For best performance, set the Frequency Governor and Power and Perf Policy to performance.
Here are related recommended power settings from svr-info.

 

Best Known Configurations:

sudo bash run_clean.sh

Running models:

In the following sections, we'll show you how to set up and run each of the six models:


Get Started with 3DUNET

Build & Run Docker container from Dockerfile

If you haven't already done so, build the Intel optimized Docker image for 3DUNET using:

cd inference_results_v3.0/closed/Intel/code/3d-unet-99.9/pytorch-cpu/docker
bash build_3dunet_container.sh

Prerequisites

Use these commands to prepare the 3DUNET dataset and model on your host system:

mkdir 3dunet
cd 3dunet
git clone https://github.com/neheller/kits19
cd kits19
pip3 install -r requirements.txt
python3 -m starter_code.get_imaging
cd ..

Set Up Environment

Follow these steps to set up the docker instance and preprocess the data.

Start a Container

Use docker run to start a container with the optimized Docker image we pulled earlier. Replace /path/of/3dunet with the 3dunet folder path created earlier:

docker run --name intel_3dunet --privileged -itd -v /path/to/3dunet:/root/mlperf_data/3dunet-kits --net=host --ipc=host mlperf_inference_3dunet:3.0

Login to Docker Instance

Login into a bashrc shell in the Docker instance.

docker exec -it intel_3dunet bash

Preprocess Data

If you need a proxy to access the internet, replace your host proxy with the proxy server for your environment. If no proxy is needed, you can skip this step:

export http_proxy="your host proxy"
export https_proxy="your host proxy"

Preprocess the data and download the model using the provided script:

pip install numpy==1.23.5
cd code/3d-unet-99.9/pytorch-cpu/
bash process_data_model.sh 

Run the Benchmark

# 3dunet only has offline mode
bash run.sh perf # offline performance
bash run.sh acc  # offline accuracy

Get the Results

  • Check log file. Performance results are in ./output/mlperf_log_summary.txt. Verify that you see results is: valid.

  • For offline mode performance, check the field Samples per second:

  • Accuracy results are in ./output/accuracy.txt. Check the field mean =.

  • The performance result is controled by the value of "targetqps" in usersocket.conf file. The scripts will automatically select usersocket.conf file according to the number of sockets on customer's platform. Customers can also manully change the value of "targetqps" in corresponding usersocket.conf files.

Save these output log files elsewhere when each test is completed as they will be overwritten by the next test.

Get started with BERT

The docker container can be created either by building it using the Dockerfile or pulling the image from Dockerhub (if available).

Build & Run Docker container from Dockerfile

If you haven't already done so, build and run the Intel optimized Docker image for BERT using:

cd inference_results_v3.0/closed/Intel/code/bert-99/pytorch-cpu/docker/

bash build_bert-99_contanier.sh

Prerequisites

Use these commands to prepare the BERT dataset and model on your host system:

cd /data/mlperf_data   # or path to where you want to store the data
mkdir bert
cd bert
mkdir dataset
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json -O dataset/dev-v1.1.json
git clone https://huggingface.co/bert-large-uncased model
cd model
wget https://zenodo.org/record/4792496/files/pytorch_model.bin?download=1 -O pytorch_model.bin

Set Up Environment

Follow these steps to set up the docker instance and preprocess the data.

Start a Container

Use docker run to start a container with the optimized Docker image we pulled or built earlier. Replace /path/of/bert with the bert folder path created earlier (i.e. /data/mlperf_data/bert):

docker run --name bert_3-0 --privileged -itd --net=host --ipc=host \
  -v /path/of/bert:/data/mlperf_data/bert <bert docker image ID>

Login to Docker Instance

Login into a bashrc shell in the Docker instance.

docker exec -it bert_3-0 bash

Convert Dataset and Model

If you need a proxy to access the internet, replace your host proxy with the proxy server for your environment. If no proxy is needed, you can skip this step:

export http_proxy="your host proxy"
export https_proxy="your host proxy"
cd code/bert-99/pytorch-cpu
export DATA_PATH=/data/mlperf_data/bert
bash convert.sh

Run the Benchmark

bash run.sh                    #offline performance
bash run.sh --accuracy         #offline accuracy
bash run_server.sh             #server performance
bash run_server.sh --accuracy  #server accuracy

Get the Results

Check the performance log file ./test_log/mlperf_log_summary.txt:

  • Verify you see results is: valid.
  • For offline mode performance, check the field Samples per second:
  • For server mode performance, check the field Scheduled samples per second:
  • The performance results are controled by the value of "targetqps" in usersocket.conf file. The scripts will automatically select usersocket.conf file according to the number of sockets on customer's platform. Customers can also manully change the value of "targetqps" in corresponding usersocket.conf files.

Check the accuracy log file ./test_log/accuracy.txt.

  • Check the field f1

Save these output log files elsewhere when each test is completed as they will be overwritten by the next test.


Get started with DLRM

Build & Run Docker container from Dockerfile

If you haven't already done so, build the Intel optimized Docker image for DLRM using:

# Please get compiler first.
cd inference_results_v3.0/closed/Intel/code/dlrm-99.9
wget https://registrationcenter-download.intel.com/akdlm/irc_nas/18679/l_HPCKit_p_2022.2.0.191.sh

# Build docker image
cd inference_results_v3.0/closed/Intel/code/dlrm-99.9/pytorch-cpu/docker
bash build_dlrm-99.9_container.sh

Prerequisites

Use these commands to prepare the Deep Learning Recommendation Model (DLRM) dataset and model on your host system:

cd /data/   # or path to where you want to store the data
mkdir -p /data/dlrm/model
mkdir -p /data/dlrm/terabyte_input

# download dataset
# Create a directory (such as /data/dlrm/terabyte_input) which contain:
#        day_fea_count.npz
#        terabyte_processed_test.bin
#
# Learn how to get the dataset from:
#     https://github.com/facebookresearch/dlrm
# You can also copy it using:
#     scp -r mlperf@10.112.230.156:/home/mlperf/dlrm_data/* /data/dlrm/terabyte_input
#
# download model
# Create a directory (such as /data/dlrm/model):
cd /data/dlrm/model
wget https://dlrm.s3-us-west-1.amazonaws.com/models/tb00_40M.pt -O dlrm_terabyte.pytorch

Set Up Environment

Follow these steps to set up the docker instance.

Start a Container

Use docker run to start a container with the optimized Docker image we pulled earlier. Replace /path/of/dlrm with the dlrm folder path created earlier (/data/dlrm for example):

docker run --name intel_inference_dlrm --privileged -itd --net=host --ipc=host \
  -v /path/of/dlrm:/data/mlperf_data/raw_dlrm mlperf_inference_dlrm:3.0

Login to Docker Container

Login into a bashrc shell in the Docker instance.

docker exec -it intel_inference_dlrm bash

Preprocess model and dataset

If you need a proxy to access the internet, replace your host proxy with the proxy server for your environment. If no proxy is needed, you can skip this step:

export http_proxy="your host proxy"
export https_proxy="your host proxy"
cd /opt/workdir/code/dlrm/pytorch-cpu
export MODEL=/data/mlperf_data/raw_dlrm/model
export DATASET=/data/mlperf_data/raw_dlrm/terabyte_input
export DUMP_PATH=/data/mlperf_data/dlrm
bash dump_model_dataset.sh

Run the Benchmark

export MODEL_DIR=/data/mlperf_data/dlrm
export DATA_DIR=/data/mlperf_data/dlrm

bash runcppsut                     # offline performance
bash runcppsut accuracy               # offline accuracy
bash runcppsut performance server  # server performance
bash runcppsut accuracy server     # server accuracy

Get the Results

Check the appropriate offline or server performance log file, either ./output/PerformanceOnly/Offline/mlperf_log_summary.txt or ./output/PerformanceOnly/Server/mlperf_log_summary.txt:

  • Verify you see results is: valid.
  • For offline mode performance, check the field Samples per second:
  • For server mode performance, check the field Scheduled samples per second:
  • The performance result is controled by the value of "targetqps" in usersocket.conf file. The scripts will automatically select usersocket.conf file according to the number of sockets on customer's platform. Customers can also manully change the value of "targetqps" in corresponding usersocket.conf files.

Check the appropriate offline or server accuracy log file, either ./output/AccuracyOnly/Offline/accuracy.txt or ./output/AccuracyOnly/Server/accuracy.txt:

  • Check the field AUC

Save these output log files elsewhere when each test is completed as they will be overwritten by the next test.


Get Started with ResNet50

The docker container can be created either by building it using the Dockerfile or pulling the image from Dockerhub (if available). Please download the Imagenet dataset on the host system before starting the container.

Download Imagenet Dataset for Calibration

Download ImageNet (50000) dataset

bash download_imagenet.sh

Build & Run Docker container from Dockerfile

If you haven't already done so, build and run the Intel optimized Docker image for ResNet50 using:

cd inference_results_v3.0/closed/Intel/code/resnet50/pytorch-cpu/docker/

bash build_resnet50_contanier.sh

docker run -v </path/to/ILSVRC2012_img_val>:/opt/workdir/code/resnet50/pytorch-cpu/ILSVRC2012_img_val -it --privileged <resnet docker image ID> /bin/bash

cd code/resnet50/pytorch-cpu

Prepare Calibration Dataset & Download Model ( Inside Container )

If you need a proxy to access the internet, replace your host proxy with the proxy server for your environment. If no proxy is needed, you can skip this step:

export http_proxy="your host proxy"
export https_proxy="your host proxy"

Prepare calibration 500 images into folders

cd /opt/workdir/code/resnet50/pytorch-cpu
bash prepare_calibration_dataset.sh

Download the model

bash download_model.sh

The downloaded model will be saved as resnet50-fp32-model.pth

Quantize Torchscript Model and Check Accuracy

  • Set the following paths:
export DATA_CAL_DIR=calibration_dataset
export CHECKPOINT=resnet50-fp32-model.pth
  • Generate scales and models
bash generate_torch_model.sh

The start and end parts of the model are also saved (respectively named) in models

Run Benchmark (Common for Docker & Baremetal)

export DATA_DIR=${PWD}/ILSVRC2012_img_val
export RN50_START=models/resnet50-start-int8-model.pth
export RN50_END=models/resnet50-end-int8-model.pth
export RN50_FULL=models/resnet50-full.pth

Performance

  • Offline
bash run_offline.sh <batch_size>
  • Server
bash run_server.sh

Accuracy

  • Offline
bash run_offline_accuracy.sh <batch_size>
  • Server
bash run_server_accuracy.sh

Get the Results

Check the ./mlperf_log_summary.txt log file:

  • Verify you see results is: valid.
  • For offline mode performance, check the field Samples per second:
  • For server mode performance, check the field Scheduled samples per second:
  • The performance result is controled by the value of "targetqps" in ./src/usersocket.conf file. The scripts will automatically select usersocket.conf file according to the number of sockets on customer's platform. Customers can also manully change the value of "targetqps" in corresponding usersocket.conf files.

Check the ./offline_accuracy.txt or ./server_accuracy.txt log file:

  • Check the field accuracy

Save these output log files elsewhere when each test is completed as they will be overwritten by the next test.


Get Started with Retinanet

The docker container can be created either by building it using the Dockerfile or pulling the image from Dockerhub (if available). Please download the Imagenet dataset on the host system before starting the container.

Download the dataset

  • Install dependencies (python3.9 or above)
pip3 install --upgrade pip --user
pip3 install opencv-python-headless==4.5.3.56 pycocotools==clear2.0.2 fiftyone==0.16.5
  • Setup env vars
CUR_DIR=$(pwd)
export WORKLOAD_DATA=${CUR_DIR}/data
mkdir -p ${WORKLOAD_DATA}

export ENV_DEPS_DIR=${CUR_DIR}/retinanet-env
  • Download OpenImages (264) dataset
bash openimages_mlperf.sh --dataset-path ${WORKLOAD_DATA}/openimages

Images are downloaded to ${WORKLOAD_DATA}/openimages

  • Download Calibration images
bash openimages_calibration_mlperf.sh --dataset-path ${WORKLOAD_DATA}/openimages-calibration

Calibration dataset downloaded to ${WORKLOAD_DATA}/openimages-calibration

Note: If you meet any obstacles on downloading the dataset, please try again in the docker container to be launched after [Build & Run Docker container from Dockerfile](Build & Run Docker container from Dockerfile).

Download Model

wget --no-check-certificate 'https://zenodo.org/record/6617981/files/resnext50_32x4d_fpn.pth' -O 'retinanet-model.pth'
mv 'retinanet-model.pth' ${WORKLOAD_DATA}/

Build & Run Docker container from Dockerfile

If you haven't already done so, build and run the Intel optimized Docker image for Retinanet using:

cd inference_results_v3.0/closed/Intel/code/retinanet/pytorch-cpu/docker/

bash build_retinanet_contanier.sh

docker run --name intel_retinanet --privileged -itd --net=host --ipc=host -v ${WORKLOAD_DATA}:/opt/workdir/code/retinanet/pytorch-cpu/data <retinanet docker image ID> 

docker exec -it intel_retinanet bash 

cd code/retinanet/pytorch-cpu/

Calibrate and generate torchscript model

If you need a proxy to access the internet, replace your host proxy with the proxy server for your environment. If no proxy is needed, you can skip this step:

export http_proxy="your host proxy"
export https_proxy="your host proxy"

Run Calibration

CUR_DIR=$(pwd)
export WORKLOAD_DATA=${CUR_DIR}/data
export CALIBRATION_DATA_DIR=${WORKLOAD_DATA}/openimages-calibration/train/data
export MODEL_CHECKPOINT=${WORKLOAD_DATA}/retinanet-model.pth
export CALIBRATION_ANNOTATIONS=${WORKLOAD_DATA}/openimages-calibration/annotations/openimages-mlperf-calibration.json

cd /opt/workdir/code/retinanet/pytorch-cpu/retinanet-env/vision
git checkout 8e078971b8aebdeb1746fea58851e3754f103053
python setup.py install && python setup.py develop

cd /opt/workdir/code/retinanet/pytorch-cpu
bash run_calibration.sh

Set Up Environment

Export the environment settings

source setup_env.sh

Run the Benchmark

# Run one of these performance or accuracy scripts at a time
# since the log files will be overwritten on each run

# for offline performance
bash run_offline.sh

# for server performance
bash run_server.sh

# for offline accuracy
bash run_offline_accuracy.sh

# for server accuracy
bash run_server_accuracy.sh

Get the results

Check the ./mlperf_log_summary.txt log file:

  • Verify you see results is: valid.
  • For offline mode performance, check the field Samples per second:
  • For server mode performance, check the field Scheduled samples per second:
  • The performance result is controled by the value of "targetqps" in usersocket.conf file. The scripts will automatically select usersocket.conf file according to the number of sockets on customer's platform. Customers can also manully change the value of "targetqps" in corresponding usersocket.conf files.

Check the ./accuracy.txt log file:

  • Check the field mAP

Save these output log files elsewhere when each test is completed as they will be overwritten by the next test.

Get Started with RNNT

Build & Run Docker container from Dockerfile

If you haven't already done so, build the Intel optimized Docker image for RNNT using:

cd inference_results_v3.0/closed/Intel/code/rnnt/pytorch-cpu/docker/
bash build_rnnt-99_container.sh

Set Up Environment

Follow these steps to set up the docker instance.

Start a Container

Use docker run to start a container with the optimized Docker image we built earlier.

docker run --name intel_rnnt --privileged -itd -v /data/mlperf_data:/data/mlperf_data \
--net=host --ipc=host mlperf_inference_rnnt:3.0

Login to Docker Container

Get the Docker container ID and login into a bashrc shell in the Docker instance using docker exec.

docker ps -a #get container "id"
docker exec -it <id> bash
cd /opt/workdir/code/rnnt/pytorch-cpu
  • Setup env vars
export LD_LIBRARY_PATH=/opt/workdir/code/rnnt/pytorch-cpu/third_party/lib:$LD_LIBRARY_PATH

If you need a proxy to access the internet, replace your host proxy with the proxy server for your environment. If no proxy is needed, you can skip this step:

export http_proxy="your host proxy"
export https_proxy="your host proxy"

Run the Benchmark

The provided run.sh script abstracts the end-to-end process for RNNT:

STAGE STEP
0 Download model
1 Download dataset
2 Pre-process dataset
3 Calibration
4 Build model
5 Run Offline/Server accuracy & benchmark

Run run.sh with STAGE=0 to invoke all the steps requried to run the benchmark (i.e download the model & dataset, preprocess the data, calibrate and build the model):

 SKIP_BUILD=1 STAGE=0 bash run.sh

or to skip to stage 5 without previous steps: Offline/Server accuracy and benchmark:

 SKIP_BUILD=1 STAGE=5 bash run.sh

Get the Results

Check the appropriate offline or server performance log files, either ./logs/Server/performance/.../mlperf_log_summary.txt or ./logs/Offline/performance/.../mlperf_log_summary.txt:

  • Verify you see results is: valid.
  • For offline mode performance, check the field Samples per second:
  • For server mode performance, check the field Scheduled samples per second:
  • The performance result is controled by the value of "targetqps" in ./configs/usersocket.conf file. The scripts will automatically select usersocket.conf file according to the number of sockets on customer's platform. Customers can also manully change the value of "targetqps" in corresponding usersocket.conf files.

Check the appropriate offline or server accuracy log file, either ./logs/Server/accuracy/.../mlperf_log_summary.txt or ./logs/Offline/accuracy/.../mlperf_log_summary.txt:

Save these output log files elsewhere when each test is completed as they will be overwritten by the next test.