Transformer-LT MLPerf FP32 Training TensorFlow* Model Package
Published: 11/13/2020
Last Updated: 06/15/2022
Download Command
wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v2_3_0/transformer-mlperf-fp32-training.tar.gz
Description
This document has instructions to run a Transformer Language FP32 training in MLPerf* benchmark suite using Intel® Optimization for TensorFlow*. Detailed information on MLPerf benchmark can be found in mlperf/training.
Datasets
Decide the problem you want to run to get the appropriate dataset. We will get the training data of it as an example:
Download dataset for computing BLEU score.
export DATASET_DIR=/home/<user>/transformer_data
mkdir $DATASET_DIR && cd $DATASET_DIR
wget https://nlp.stanford.edu/projects/nmt/data/wmt14.en-de/newstest2014.en
wget https://nlp.stanford.edu/projects/nmt/data/wmt14.en-de/newstest2014.de
For the training dataset, download and untar the model package.
wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v2_3_0/transformer-mlperf-fp32-training.tar.gz
tar -xzf transformer-mlperf-fp32-training.tar.gz
export PYTHONPATH=$PYTHONPATH:/home/<user>/transformer-mlperf-fp32-training/models/common/tensorflow
export DATASET_DIR=/home/<user>/transformer_data
cd /home/<user>/transformer-mlperf-fp32-training/models/language_translation/tensorflow/transformer_mlperf/training/fp32/transformer
python data_download.py --data_dir=$DATASET_DIR
Running python data_download.py --data_dir=$DATASET_DIR
assumes you have a Python* environment similar to what the intel/intel-optimized-tensorflow:2.4.0-ubuntu-18.04
container provides. One option would be to run the above within the intel/intel-optimized-tensorflow:2.4.0-ubuntu-18.04
container eg: docker run -u $(id -u):$(id -g) --privileged --entrypoint /bin/bash -v /home/:/home/ -it intel/intel-optimized-tensorflow:2.4.0-ubuntu-18.04
Quick Start Scripts
Transformer Language in MLPerf benchmark can run with full training or fewer training steps. During training we can control if it will do the evaluation or not.
Script name | Description |
---|---|
fp32_training_demo |
Runs 100 training steps (run on a single socket of the CPU). |
fp32_training |
Runs 200 training steps, saves checkpoints and do evaluation (run on a single socket of the CPU). |
fp32_training_mpirun |
Runs training in multi-instance mode "2 sockets in a single node for example" using mpirun for the specified number of processes. |
Bare Metal
To run on bare metal, the following prerequisites must be installed in your environment:
- Python* 3
- intel-tensorflow
- numactl
After installing the prerequisites, download and untar the model package. Set environment variables for the path to your DATASET_DIR
and an OUTPUT_DIR
where log files will be written, then run a quickstart script.
DATASET_DIR=<path to the dataset>
OUTPUT_DIR=<directory where log files will be written>
wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v2_3_0/transformer-mlperf-fp32-training.tar.gz
tar -xzf transformer-mlperf-fp32-training.tar.gz
cd transformer-mlperf-fp32-training
quickstart/fp32_training.sh (or quickstart/fp32_training_demo.sh)
For training in multi-instance mode (2 sockets in a single node for example) in evaluation mode, where we are "saving checkpoints" and "doing the evaluation", the following prerequisites must be installed in your environment:
- gcc-8
- g++-8
- libopenmpi-dev
- Open MPI
- OpenSSH
- Horovod*
Set environment variables for the path to your DATASET_DIR
and an OUTPUT_DIR
where log files will be written, then run a quickstart script.
DATASET_DIR=<path to the dataset>
OUTPUT_DIR=<directory where log files will be written>
cd transformer-mlperf-fp32-training
quickstart/fp32_training_mpirun.sh
Documentation and Sources
Get Started
Main GitHub*
Readme
Release Notes
Get Started Guide
Code Sources
Report Issue
License Agreement
LEGAL NOTICE: By accessing, downloading or using this software and any required dependent software (the “Software Package”), you agree to the terms and conditions of the software license agreements for the Software Package, which may also include notices, disclaimers, or license terms for third party software included with the Software Package. Please refer to the license file for additional details.
Related Containers and Solutions
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.