Dynamic Parameter Reallocation Improves Trainability of Deep Convolutional Networks



Network pruning has emerged as a powerful technique for reducing the size of deep neural networks. Pruning uncovers high-performance subnetworks by taking a trained dense network and gradually removing unimportant connections. Recently, alternative techniques have emerged for training sparse networks directly without having to train a large dense model beforehand, thereby achieving small memory footprints during both training and inference. These techniques are based on dynamic reallocation of non-zero parameters during training. Thus, they are in effect executing a training-time search for the optimal subnetwork. We investigate a most recent one of these techniques and conduct additional experiments to elucidate its behavior in training sparse deep convolutional networks. Dynamic parameter reallocation converges early during training to a highly trainable subnetwork. We show that neither the structure, nor the initialization of the discovered high-performance subnetwork is sufficient to explain its good performance. Rather, it is the dynamics of parameter reallocation that are responsible for successful learning. Dynamic parameter reallocation thus improves the trainability of deep convolutional networks, playing a similar role as overparameterization, without incurring the memory and computational cost of the latter...