We provide a novel perspective on the forward pass through a block of layers in a deep network. In particular, we show that a forward pass through a standard dropout layer followed by a linear layer and a non-linear activation is equivalent to optimizing a convex objective with a single iteration of a \tau-nice Proximal Stochastic Gradient method. We further show that replacing standard Bernoulli dropout with additive dropout is equivalent to optimizing the same convex objective with a variance-reduced proximal method...
Authors
Related Content
Sparse Dictionary Learning by Dynamical Neural Networks
A dynamical neural network consists of a set of interconnected neurons that interact over time continuously. It can exhibit computational...
SPIGAN: Privileged Adversarial Learning from Simulation
Deep Learning for Computer Vision depends mainly on the source of supervision. Photo-realistic simulators can generate large-scale automatically labeled synthetic...
Trellis Networks for Sequence Modeling
We present trellis networks, a new architecture for sequence modeling. On the one hand, a trellis network is a temporal...
Graph-DQN: Fast Generalization To Novel Objects Using Prior...
Humans have a remarkable ability to both generalize known actions to novel objects, and reason about novel objects once their...