On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

author-image

By