Pull Command
docker pull intel/object-detection:tf-latest-ssd-resnet34-bfloat16-training
Description
This document has instructions for running ResNet34* SSD BFloat 16 training using Intel® Optimization for TensorFlow*.
Datasets
ResNet34 SSD training uses the COCO dataset. Use the following instructions to download and preprocess the dataset.
-
Download and extract the 2017 training images and annotations for the COCO dataset:
export MODEL_WORK_DIR=$(pwd) # Download and extract train images wget http://images.cocodataset.org/zips/train2017.zip unzip train2017.zip # Download and extract annotations wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip unzip annotations_trainval2017.zip
-
Since we are only using the train and validation dataset in this example, we will create an empty directory and empty annotations json file to pass as the test directories in the next step.
# Create an empty dir to pass for validation and test images mkdir empty_dir # Add an empty .json file to bypass validation/test image preprocessing cd annotations echo "{ \"images\": {}, \"categories\": {}}" > empty.json cd ..
-
Use the TensorFlow models repo scripts to convert the raw images and annotations to the TF records format.
git clone https://github.com/tensorflow/models.git tf_models cd tf_models git checkout 7a9934df2afdf95be9405b4e9f1f2480d748dc40 cd ..
-
Install the prerequisites mentioned in the TensorFlow models object detection installation doc and run protobuf compilation on the code that was cloned in the previous step.
-
After your envionment is setup, run the conversion script:
cd tf_models/research/object_detection/dataset_tools/ # call script to do conversion python create_coco_tf_record.py --logtostderr \ --train_image_dir="$MODEL_WORK_DIR/train2017" \ --val_image_dir="$MODEL_WORK_DIR/empty_dir" \ --test_image_dir="$MODEL_WORK_DIR/empty_dir" \ --train_annotations_file="$MODEL_WORK_DIR/annotations/instances_train2017.json" \ --val_annotations_file="$MODEL_WORK_DIR/annotations/empty.json" \ --testdev_annotations_file="$MODEL_WORK_DIR/annotations/empty.json" \ --output_dir="$MODEL_WORK_DIR/output"
The
coco_train.record-*-of-*
files are what we will use in this training example. Set the output of the preprocessing script (export DATASET_DIR=$MODEL_WORK_DIR/output
) when running quickstart scripts.
For accuracy testing, download the COCO validation dataset, using the instructions here.
Quick Start Scripts
Script name | Description |
---|---|
bfloat16_training_demo |
Executes a demo run with a limited number of training steps to test performance. Set the number of steps using the TRAIN_STEPS environment variable (defaults to 100). |
bfloat16_training |
Runs multi-instance training to convergence. Download the backbone model specified in the instructions below and pass that directory path in the BACKBONE_MODEL_DIR environment variable. |
bfloat16_training_accuracy |
Runs the model in eval mode to check accuracy. Specify which checkpoint files to use with the CHECKPOINT_DIR environment variable. |
Docker*
The model container includes the scripts and libraries needed to run ResNet34 SSD BFloat 16 training. To run one of the quickstart scripts using this container, you'll need to provide volume mounts for the dataset and an output directory where the log files and checkpoints will be written. To run more than one process, set the MPI_NUM_PROCESSES
environment variable in the container. Depending on which quickstart script is being run, other volume mounts or environment variables may be required.
When using the bfloat16_training_demo.sh
quickstart script, the TRAIN_STEPS
(defaults to 100) environment variable can be set in addition to the DATASET_DIR
and OUTPUT_DIR
. The MPI_NUM_PROCESSES
will default to 1 if it is not set.
export DATASET_DIR=<path to the COCO training data>
export OUTPUT_DIR=<directory where the log file will be written>
export TRAIN_STEPS=<optional, defaults to 100>
export MPI_NUM_PROCESSES=<optional, defaults to 1>
docker run \
--env DATASET_DIR=${DATASET_DIR} \
--env OUTPUT_DIR=${OUTPUT_DIR} \
--env TRAIN_STEPS=${TRAIN_STEPS} \
--env MPI_NUM_PROCESSES=${MPI_NUM_PROCESSES} \
--env http_proxy=${http_proxy} \
--env https_proxy=${https_proxy} \
--volume ${DATASET_DIR}:${DATASET_DIR} \
--volume ${OUTPUT_DIR}:${OUTPUT_DIR} \
--privileged --init -it \
intel/object-detection:tf-latest-ssd-resnet34-bfloat16-training \
/bin/bash quickstart/bfloat16_training_demo.sh
Documentation and Sources
Get Started
Docker* Repository
Main GitHub*
Readme
Release Notes
Get Started Guide
Code Sources
Dockerfile
Report Issue
License Agreement
LEGAL NOTICE: By accessing, downloading or using this software and any required dependent software (the “Software Package”), you agree to the terms and conditions of the software license agreements for the Software Package, which may also include notices, disclaimers, or license terms for third party software included with the Software Package. Please refer to the license file for additional details.