Best Practices for Scaling Deep Learning Training and Inference with TensorFlow* On Intel® Xeon® Processor-Based HPC Infrastructures

Best Practices for Scaling Deep Learning Training and Inference with TensorFlow* On Intel® Xeon® Processor-Based HPC Infrastructures

Best Practices for Scaling Deep Learning Training and Inference with TensorFlow* On Intel® Xeon® Processor-Based HPC Infrastructures

This document describes the setup, installation and procedure to run distributed Deep Learning training using TensorFlow with Uber Horovod MPI library. The steps required to run the benchmark can vary slightly depending on the user’s environment. In case of a large cluster with the order of hundreds or thousands of nodes, we provide sample scripts that use the SLURM scheduler. Alternatively, we ...also list out steps for smaller systems that may not have such a scheduler configured...

Stay Connected


Keep tabs on all the latest news with our monthly newsletter.

Related Videos