This document has instructions for running ResNet50* v1.5 bfloat16 training using Intel® Optimization for TensorFlow*.
Note that the ImageNet dataset is used in these ResNet50 v1.5 examples. Download and preprocess the ImageNet dataset using the instructions here. After running the conversion script you should have a directory with the ImageNet dataset in the TF records format.
DATASET_DIR to point to this directory when running ResNet50 v1.5.
Quick Start Scripts
||Launches a short run using small batch sizes and a limited number of steps to demonstrate the training flow|
||Launches a test run that trains the model for one epoch and saves checkpoint files to an output directory.|
||Trains the model using the full dataset and runs until convergence (90 epochs) and saves checkpoint files to an output directory. Note that this will take a considerable amount of time.|
To run on bare metal, the following prerequisites must be installed in your enviornment:
- Python* 3
- Intel® Optimization for TensorFlow*
Download and untar the model package and then run a quick start script.
DATASET_DIR=<path to the preprocessed imagenet dataset> OUTPUT_DIR=<directory where checkpoint and log files will be written> wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v2_3_0/resnet50v1-5-bfloat16-training.tar.gz tar -xvf resnet50v1-5-bfloat16-training.tar.gz cd resnet50v1-5-bfloat16-training quickstart/<script name>.sh
To run distributed training (one MPI process per socket) for better throughput, set the MPI_NUM_PROCESSES var to the number of sockets to use. To run with multiple instances, these additional dependencies will need to be installed in your environment:
- OpenSSH client
- OpenSSH server
- Horovod* 0.19.1
DATASET_DIR=<path to the preprocessed imagenet dataset> OUTPUT_DIR=<directory where checkpoint and log files will be written> MPI_NUM_PROCESSES=<number of sockets to use> wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v2_3_0/resnet50v1-5-bfloat16-training.tar.gz tar -xvf resnet50v1-5-bfloat16-training.tar.gz cd resnet50v1-5-bfloat16-training quickstart/<script name>.sh
Documentation and Sources
LEGAL NOTICE: By accessing, downloading or using this software and any required dependent software (the “Software Package”), you agree to the terms and conditions of the software license agreements for the Software Package, which may also include notices, disclaimers, or license terms for third party software included with the Software Package. Please refer to the license file for additional details.
Related Containers and Solutions
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.