Optimize a BERT-Large Bfloat16 Training Container with TensorFlow*

Published: 11/09/2020  

Last Updated: 05/25/2022

Pull Command

docker pull intel/language-modeling:tf-latest-bert-large-bfloat16-training

Description

This document has instructions for running BERT-Large bfloat16 training using Intel® Optimization for TensorFlow*.

Datasets

Follow instructions in BERT Large datasets to download and preprocess the dataset. You can do either classification training or fine-tuning using SQuAD.

Quick Start Scripts

Script name Description
bfloat16_classifier_training This script fine-tunes the BERT base model on the Microsoft Research Paraphrase Corpus (MRPC), which only contains 3,600 examples. Download the BERT base pretrained model and set the CHECKPOINT_DIR to that directory. The DATASET_DIR should point to the GLUE data.
bfloat16_squad_training This script fine-tunes BERT using SQuAD data. Download the BERT-Large pretrained model and set the CHECKPOINT_DIR to that directory. The DATASET_DIR should point to the SQuAD data files.
bfloat16_squad_training_demo This script does a short demonstration run of 0.01 epochs using SQuAD data.

Docker*

The BERT-Large bfloat16 training model container includes the scripts and libraries needed to run BERT-Large bfloat16 fine-tuning. To run one of the quick start scripts using this container, you'll need to provide volume mounts for the pretrained model, dataset, and an output directory where log and checkpoint files will be written. If switching between running squad and classifier training or running classifier training multiple times, use a new empty OUTPUT_DIR to prevent incompatible checkpoints from getting picked up. See the list of quickstart scripts for details on the different options.

The snippet below shows a quick start script running with a single instance:

CHECKPOINT_DIR=<path to the pretrained bert model directory>
DATASET_DIR=<path to the dataset being used>
OUTPUT_DIR=<directory where checkpoints and log files will be saved>

docker run \
  --env CHECKPOINT_DIR=${CHECKPOINT_DIR} \
  --env DATASET_DIR=${DATASET_DIR} \
  --env OUTPUT_DIR=${OUTPUT_DIR} \
  --env http_proxy=${http_proxy} \
  --env https_proxy=${https_proxy} \
  --volume ${CHECKPOINT_DIR}:${CHECKPOINT_DIR} \
  --volume ${DATASET_DIR}:${DATASET_DIR} \
  --volume ${OUTPUT_DIR}:${OUTPUT_DIR} \
  --privileged --init -it \
  intel/language-modeling:tf-latest-bert-large-bfloat16-training \
  /bin/bash quickstart/<script name>.sh

Documentation and Sources

Get Started​
Docker* Repository
Main GitHub*
Readme
Release Notes
Get Started Guide

Code Sources
Dockerfile
Report Issue


License Agreement

LEGAL NOTICE: By accessing, downloading or using this software and any required dependent software (the “Software Package”), you agree to the terms and conditions of the software license agreements for the Software Package, which may also include notices, disclaimers, or license terms for third party software included with the Software Package. Please refer to the license file for additional details.


Related Containers and Solutions

BERT-Large BFloat16 Training TensorFlow Model Package

View All Containers and Solutions 🡢

 

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.