Intel Presents Latest Deep Learning Research at International Conference on Learning Representations (ICLR) May 2021


  • The virtual International Conference on Learning Representations (ICLR) is coming up, May 3-7, 2021.

  • Intel researchers and collaborators will present innovative findings and insights related to a wide variety of deep learning/representation learning topics.



Intel researchers and collaborators will present eight papers at the virtual International Conference on Learning Representations (ICLR) on May 3-7, as well as an all-day workshop. Through its annual conference, ICLR provides a venue for researchers to present and publish cutting-edge research on all aspects of deep learning used in the fields of artificial intelligence (AI), statistics and data science, as well as important application areas such as machine vision, computational biology, speech recognition, natural language processing (NLP), gaming, and robotics. Intel research papers cover a wide range of deep learning topics that illustrate the depth and breadth of Intel’s expertise in the deep learning field.

Intel and Research Collaborators Workshops and Papers at ICLR 2021


How Can Findings About The Brain Improve AI Systems? (Event #2135)

In this intensive, all-day workshop, we’ll explore the extent to which insights about the brain can lead to better AI. The workshop will be held on May 7th and will address questions such as the following:

  • What areas of AI can most benefit from neuroscientific insight?
  • What are the current bottlenecks in integrating neuroscientific data with AI systems?
  • Which granularities of neuroscientific data are best suited to integrate with AI systems?
  • What are the benefits and limits of data-driven approaches that integrate neuroscientific data into AI system training?
  • How can neuroscientific data benefit AI systems that perform tasks at which humans excel, such as natural language processing (NLP) and vision?

The workshop is an opportunity to participate in a close-knit specialized research community dedicated to applying neuroscience to AI. There will be several speakers as well as panel discussions.



We present an original approach towards defining a unified energy-based solution for the semi-supervised visual anomaly detection and localization problem, where we have access to only anomaly-free training data and want to detect and identify anomalies of an arbitrary nature on test data.


We propose a novel massively parallel Monte-Carlo Tree Search (MP-MCTS) algorithm. It works efficiently for a 1,000 worker scale on a distributed memory environment using multiple compute nodes and can be applied to molecular design. This paper is the first work that applies distributed MCTS to a real-world and non-game problem. Existing works on large-scale parallel MCTS show efficient scalability in terms of the number of rollouts up to 100 workers. Still, they suffer from the degradation in the quality of the solutions. MP-MCTS maintains the search quality at a larger scale. By running MP-MCTS on 256 CPU cores for only 10 minutes, we obtained candidate molecules with similar scores to non-parallel MCTS running for 42 hours. Our method is generic and is expected to speed up other applications of MCTS.


We use deep reinforcement learning to optimize memory management on hardware. We benchmarked on Intel® Nervana™ Neural Network Processor (NNP) for Inference (NNP-I) and showed up to 78% speedup on BERT just by better memory management. Essentially, we replaced the compiler’s own memory management logic with a deep neural network planner.


While training can mostly be accelerated by reducing the time needed to propagate neural gradients back throughout the model, most previous research has focused on the quantization/pruning of weights and activations. These methods are often not applicable to neural gradients, which have very different statistical properties. Unlike weights and activations, we find that the distribution of neural gradients is approximately log-normal. We suggest two closed-form analytical methods to reduce the computational and memory burdens of neural gradients. Both methods achieve state-of-the-art results on ImageNet. To the best of our knowledge, this paper is the first to (1) quantize the gradients to 6-bit floating-point formats, or (2) achieve up to 85% gradient sparsity – in each case without accuracy degradation.


Understanding how large neural networks avoid memorizing training data is key to explaining their high generalization performance. We provide a deep dive into how neural networks learn based on their internal feature representations. We present interesting insights into how neural nets learn to generalize new data versus memorize patterns that they have seen.


Recent research has shown remarkable success in revealing "steering" directions in the latent spaces of pre-trained Generative Adversarial Networks (GANs). These directions correspond to semantically meaningful image transformations (such as shift, zoom, and color manipulations), and have the same interpretable effect across all categories that the GAN can generate. Some methods focus on user-specified transformations, while others discover transformations in an unsupervised manner. However, all existing techniques rely on an optimization procedure to expose those directions, and offer no control over the degree of allowed interaction between different transformations. In this paper, we show that "steering" trajectories can be computed in closed form directly from the generator's weights without any form of training or optimization.


We present a theory for how the memory gating mechanism in long short-term memory (LSTM) can capture the temporal decay statistics of natural language by fixing each unit to process information at one temporal scale. We designed an LSTM model to approximate these statistics, creating an inductive bias for the model to optimize its ability to model language. We find that our multi-timescale LSTM has improved language modeling performance, especially for words with long-range temporal dependencies.


We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work, realizing end-to-end training speeds of up to 72,000 frames per second on a single eight-GPU machine. The key idea of our approach is to design a 3D renderer and embodied navigation simulator around the principle of “batch simulation”: accepting and executing large batches of requests simultaneously.  By combining batch simulation and deep neural network (DNN) performance optimizations, we demonstrate that PointGoal navigation agents can be trained in complex 3D environments on a single GPU in 1.5 days to 97% of the accuracy of agents trained on a prior state-of-the-art system using a 64-GPU cluster over three days.