The exponential growth in use of large deep neural networks has accelerated the need for training these deep neural networks in hours or even minutes. This can only be achieved through scalable and efficient distributed training, since a single node/card cannot satisfy the compute, memory, and I/O requirements of today's state-of-the-art deep neural networks. However, scaling synchronous Stochastic Gradient Descent (SGD) is still a challenging problem and requires continued research/development...
Authors
Related Content
Abstractions for Containerized Machine Learning Workloads in the...
Many institutions rely on Machine Learning (ML) to meet their goals. ML workloads are computationally intensive and as a result...
Performance Improvement via Always-Abort HTM
This work proposes and discusses the implications of adding a new feature to hardware transactional memory, allowing a program to...
Planning for Performance: Persistent Collective Operations for MPI
Advantages of nonblocking collective communication in MPI have been established over the past quarter century, even predating MPI1. For regular...
Galactos: Computing the Anisotropic 3-Point Correlation Function for...
The nature of dark energy and the complete theory of gravity are two central questions currently facing cosmology. A vital...