Intel AI Research at ICLR 2018

AUTHOR:

Intel AI Research will be showcasing six accepted papers/posters, one oral session, and one workshop at the 6th International Conference on Learning Representations in Vancouver, Canada.

Papers & Poster Sessions

Mon, Apr 30th 4:30pm–6:30pm: Mixed Precision Training of Convolutional Neural Networks using Integer Operations
Dipankar Das, Naveen Mellempudi, Dheevatsa Mudigere, Dhiraj Kalamkar, Sasikanth Avancha, Kunal Banerjee, Srinivas Sridharan, Karthik Vaidyanathan, Bharat Kaul, Evangelos Georganas, Alexander Heinecke, Pradeep Dubey, Jesus Corbal, Nikita Shustrov, Roma Dubtsov, Evarist Fomenko, Vadim Pirogov

Abstract: The state-of-the-art (SOTA) for mixed precision training is dominated by variants of low precision floating point operations, and in particular, FP16 accumulating into FP32 Micikevicius et al. (2017). On the other hand, while a lot of research has also happened in the domain of low and mixed-precision Integer training, these works either present results for non-SOTA networks (for instance only AlexNet for ImageNet-1K), or relatively small datasets (like CIFAR-10). In this work, we train state-of-the-art visual understanding neural networks on the ImageNet-1K dataset, with Integer operations on General Purpose (GP) hardware. In particular, we focus on Integer Fused-Multiply-and-Accumulate (FMA) operations which take two pairs of INT16 operands and accumulate results into an INT32 output.We propose a shared exponent representation of tensors and develop a Dynamic Fixed Point (DFP) scheme suitable for common neural network operations. The nuances of developing an efficient integer convolution kernel is examined, including methods to handle overflow of the INT32 accumulator. We implement CNN training for ResNet-50, GoogLeNet-v1, VGG-16 and AlexNet; and these networks achieve or exceed SOTA accuracy within the same number of iterations as their FP32 counterparts without any change in hyper-parameters and with a 1.8X improvement in end-to-end training throughput. To the best of our knowledge these results represent the first INT16 training results on GP hardware for ImageNet-1K dataset using SOTA CNNs and achieve highest reported accuracy using half precision

Read the paper: https://openreview.net/forum?id=H135uzZ0-

Mon, Apr 30th 4:30pm–6:30pm: Semi-Parametric Topological Memory for Navigation
Nikolay Savinov, Alexey Dosovitskiy, Vladlen Koltun

Abstract: We introduce a new memory architecture for navigation in previously unseen environments, inspired by landmark-based navigation in animals. The proposed semi-parametric topological memory (SPTM) consists of a (non-parametric) graph with nodes corresponding to locations in the environment and a (parametric) deep network capable of retrieving nodes from the graph based on observations. The graph stores no metric information, only connectivity of locations corresponding to the nodes. We use SPTM as a planning module in a navigation system. Given only 5 minutes of footage of a previously unseen maze, an SPTM-based navigation agent can build a topological map of the environment and use it to confidently navigate towards goals. The average success rate of the SPTM agent in goal-directed navigation across test environments is higher than the best-performing baseline by a factor of three.

Read the paper: https://openreview.net/forum?id=SygwwGbRW

Tue, May 1st 4:30pm–6:30pm: WRPN: Wide Reduced-Precision Networks
Asit Mishra, Eriko Nurvitadhi, Jeffrey J Cook, Debbie Marr

Abstract: For computer vision applications, prior works have shown the efficacy of reducing numeric precision of model parameters (network weights) in deep neural networks. Activation maps, however, occupy a large memory footprint during both the training and inference step when using mini-batches of inputs. One way to reduce this large memory footprint is to reduce the precision of activations. However, past works have shown that reducing the precision of activations hurts model accuracy. We study schemes to train networks from scratch using reduced-precision activations without hurting accuracy. We reduce the precision of activation maps (along with model parameters) and increase the number of filter maps in a layer, and find that this scheme matches or surpasses the accuracy of the baseline full-precision network. As a result, one can significantly improve the execution efficiency (e.g. reduce dynamic memory footprint, memory band- width and computational energy) and speed up the training and inference process with appropriate hardware support. We call our scheme WRPN -- wide reduced-precision networks. We report results and show that WRPN scheme is better than previously reported accuracies on ILSVRC-12 dataset while being computationally less expensive compared to previously reported reduced-precision networks.

Read the paper: https://openreview.net/forum?id=B1ZvaaeAZ&noteId=rk7t20mWG

Tue, May 1st 11:00am–1:00 pm: Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy
Asit Mishra, Debbie Marr

Abstract: Deep learning networks have achieved state-of-the-art accuracies on computer vision workloads like image classification and object detection. The performant systems, however, typically involve big models with numerous parameters. Once trained, a challenging aspect for such top performing models is deployment on resource constrained inference systems -- the models (often deep networks or wide networks or both) are compute and memory intensive. Low precision numerics and model compression using knowledge distillation are popular techniques to lower both the compute requirements and memory footprint of these deployed models. In this paper, we study the combination of these two techniques and show that the performance of low precision networks can be significantly improved by using knowledge distillation techniques. We call our approach Apprentice and show state-of-the-art accuracies using ternary precision and 4-bit precision for many variants of ResNet architecture on ImageNet dataset. We study three schemes in which one can apply knowledge distillation techniques to various stages of the train-and-deploy pipeline.

Read the paper: https://openreview.net/forum?id=B1ae1lZRb&noteId=B1e0XDKXf

Wed, May 2nd 11:00am–1:00pm: Boosting Dilated Convolutional Networks with Mixed Tensor Decompositions
Nadav Cohen, Ronen Tamari, Amnon Shashua

Abstract: The driving force behind deep networks is their ability to compactly represent rich classes of functions. The primary notion for formally reasoning about this phenomenon is expressive efficiency, which refers to a situation where one network must grow unfeasibly large in order to realize (or approximate) functions of another. To date, expressive efficiency analyses focused on the architectural feature of depth, showing that deep networks are representationally superior to shallow ones. In this paper we study the expressive efficiency brought forth by connectivity, motivated by the observation that modern networks interconnect their layers in elaborate ways. We focus on dilated convolutional networks, a family of deep models delivering state of the art performance in sequence processing tasks. By introducing and analyzing the concept of mixed tensor decompositions, we prove that interconnecting dilated convolutional networks can lead to expressive efficiency. In particular, we show that even a single connection between intermediate layers can already lead to an almost quadratic gap, which in large-scale settings typically makes the difference between a model that is practical and one that is not. Empirical evaluation demonstrates how the expressive efficiency of connectivity, similarly to that of depth, translates into gains in accuracy. This leads us to believe that expressive efficiency may serve a key role in the development of new tools for deep network design.

Read the paper: https://arxiv.org/abs/1703.06846

Wed, May 2nd 4:30pm–6:30pm: TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement Learning
Artemij Amiranashvili, Alexey Dosovitskiy, Vladlen Koltun, Thomas Brox

Abstract: Our understanding of reinforcement learning (RL) has been shaped by theoretical and empirical results that were obtained decades ago using tabular representations and linear function approximators. These results suggest that RL methods that use temporal differencing (TD) are superior to direct Monte Carlo estimation (MC). How do these results hold up in deep RL, which deals with perceptually complex environments and deep nonlinear models? In this paper, we re-examine the role of TD in modern deep RL, using specially designed environments that control for specific factors that affect performance, such as reward sparsity, reward delay, and the perceptual complexity of the task. When comparing TD with infinite-horizon MC, we are able to reproduce classic results in modern settings. Yet we also find that finite-horizon MC is not inferior to TD, even when rewards are sparse or delayed. This makes MC a viable alternative to TD in deep RL.

Read the paper: https://openreview.net/forum?id=HyiAuyb0b

Workshops & Sessions

Tue, May 1st 3:15 - 3:30 PM @ Exhibition Hall A: Boosting Dilated Convolutional Networks with Mixed Tensor Decompositions (Oral)

Abstract: The driving force behind deep networks is their ability to compactly represent rich classes of functions. The primary notion for formally reasoning about this phenomenon is expressive efficiency, which refers to a situation where one network must grow unfeasibly large in order to replicate functions of another. To date, expressive efficiency analyses focused on the architectural feature of depth, showing that deep networks are representationally superior to shallow ones. In this paper we study the expressive efficiency brought forth by connectivity, motivated by the observation that modern networks interconnect their layers in elaborate ways. We focus on dilated convolutional networks, a family of deep models delivering state of the art performance in sequence processing tasks. By introducing and analyzing the concept of mixed tensor decompositions, we prove that interconnecting dilated convolutional networks can lead to expressive efficiency. In particular, we show that even a single connection between intermediate layers can already lead to an almost quadratic gap, which in large-scale settings typically makes the difference between a model that is practical and one that is not. Empirical evaluation demonstrates how the expressive efficiency of connectivity, similarly to that of depth, translates into gains in accuracy. This leads us to believe that expressive efficiency may serve a key role in developing new tools for deep network design.

Wed, May 2nd 4:30 - 6:30 PM @ East Meeting Level 8 + 15 #14: 3D-Scene-GAN: Three-dimensional Scene Reconstruction with Generative Adversarial Networks (Workshop)

Abstract: Three-dimensional (3D) Reconstruction is a vital and challenging research topic in advanced computer graphics and computer vision due to the intrinsic complexity and computation cost. Existing methods often produce holes, distortions and obscure parts in the reconstructed 3D models which are not adequate for real usage. The focus of this paper is to achieve high quality 3D reconstruction performance of complicated scene by adopting Generative Adversarial Network (GAN). We propose a novel workflow, namely 3D-Scene-GAN, which can iteratively improve any raw 3D reconstructed models consisting of meshes and textures. 3D-Scene-GAN is a weakly semi-supervised model. It only takes real-time 2D observation images as the supervision, and doesn’t rely on prior knowledge of shape models or any referenced observations. Finally, through the qualitative and quantitative experiments, 3D-Scene-GAN shows compelling advantages over the state-of-the-art methods: balanced rank estimation (BRE) scores are improved by 30%-100% on ICL-NUIM dataset, and 36%-190% on SUN3D dataset. And the mean distance error (MDR) also outperforms other state-of-the-art methods on benchmarks.

Intel AI Residency

In 2018, Intel AI Research is inviting a select group of applicants to join its research labs in California for a one year residency to push the boundaries of artificial intelligence. Our residency program offers a unique combination of cutting edge research with a commercial understanding of technology development. We apply research across the whole AI solution space to solve real world customer problems that require us to build new paradigms of compute.

Residents will get to work on a range of projects that align with their interests from silicon to machine learning algorithms to large-scale AI system deployments and everything in between. We want them to embed in our organization and make meaningful contributions to research (including publications), open source projects and technology built by our global engineering teams. We are entrepreneurial by nature and hope our residents will share the same initiative and passion for building, innovating and challenging themselves.

By the end of this program, residents will gain a systems-level perspective on AI and will have interacted closely with world class research teams across Intel in addition to our academic and corporate partners.

While our residency is one year, it creates a long-term path to collaborating with Intel. Residents will make connections, acquire knowledge and gain experience that elevates their career in both research and technology.

Program dates: September 2018 to August 2019

Contact: airesidency@intel.com for more information

Our hiring managers and recruiters will be present at the booth #305. Come meet us to learn more about our residency program and AI jobs openings. Be sure to join our talent network for the latest updates on Intel AI research careers.

Stay Connected


Keep tabs on all the latest news with our monthly newsletter.