Deep learning is a promising tool to determine the physical model that describes our universe. To handle the considerable computational cost of this problem, we present CosmoFlow: a highly scalable deep learning application built on top of the TensorFlow framework. CosmoFlow uses efficient implementations of 3D convolution and pooling primitives, together with improvements in threading for many element-wise operations, to improve training performance on Intel(C) Xeon Phi(TM) processors. We also utilize the Cray PE Machine Learning Plugin for efficient scaling to multiple nodes. We demonstrate fully synchronous data-parallel training on 8192 nodes of Cori with 77% parallel efficiency, achieving 3.5 Pflop/s sustained performance. To our knowledge, this is the first large-scale science application of the TensorFlow framework at supercomputer scale with fully-synchronous training. These enhancements enable us to process large 3D dark matter distribution and predict the cosmological parameters
Authors
Amrita Mathuriya
Deborah Bard
Peter Mendygral
Lawrence Meadows
James Arnemann
Siyu He
Tuomas Karna
Daina Moise
Simon J. Pennycook
Kristyn Maschoff
Jason Sewall
Nalini Kumar
Shirley Ho
Mike Ringenburg
Prabhat
Victor Lee
Related Content
Precision and Recall for Time Series
Classical anomaly detection is principally concerned with point-based anomalies, those anomalies that occur at a single point in time. Yet,...
Many-Core Graph Workload Analysis
Graph applications have specific characteristics that are not common in other application domains and therefore require thorough analysis to guide...
Optimizing High Performance Distributed Memory Parallel Hash Tables...
High-throughput DNA sequencing is the mainstay of modern genomics research. A common operation used in bioinformatic analysis for many applications...
Real-Time Full Correlation Matrix Analysis of fMRI Data
Real-time functional magnetic resonance imaging (rtfMRI) is an emerging approach for studying the functioning of the human brain. Computational challenges...