The stochastic gradient descent (SGD) method and its variants are algorithms of choice for many Deep Learning tasks. These methods operate in a small-batch regime wherein a fraction of the training data, say 32-512 data points, is sampled to compute an approximation to the gradient. It has been observed in practice that when using a larger batch there is a degradation in the quality of the model, as measured by its ability to generalize...
Related Content
Enabling Factor Analysis on Thousand-Subject Neuroimaging Datasets
The scale of functional magnetic resonance image data is rapidly increasing as large multi-subject datasets are becoming widely available and...
Binary Optimized Hashing
This paper studies the problem of learning to hash, which is essentially a mixed integer optimization problem, containing both the...
Dynamic Network Surgery for Efficient DNNs
Deep learning has become a ubiquitous technology to improve machine intelligence. However, most of the existing deep models are structurally...
Faster CNNs with Direct Sparse Convolutions and Guided...
Phenomenally successful in practical inference problems, convolutional neural networks (CNN) are widely deployed in mobile devices, data centers, and even...