The photons which constitute a ray of light behave intelligently: out of all possible curves they always select the one which will take them most quickly to their goal.
Two of the biggest challenges facing the field of deep learning are power consumption and latency. To the first challenge, Max Welling’s keynote at ICML 2018 beautifully describes the concept of “Intelligence per Kilowatt Hour,” highlighting the imperative of power-efficient computation for AI. On the topic of latency, AI practitioners are acutely aware that in safety-critical, real-time applications like transportation, faster reaction times translate directly into higher safety. Two years ago, ground-breaking research by Shen et al. at MIT proposed an intriguing path towards both lower latency and higher energy efficiency: optical neural networks (ONNs). At last week’s CLEO conference, and in a longer form paper in Optics Express, we and our collaborators at UC Berkeley presented new findings around ONNs, including a proposal for how that original work could be extended in the face of real-world manufacturing constraints to bring nanophotonic neural network circuits one step closer to a practical reality.
Optical neural networks
Photons have long been attractive to hardware designers because of how quickly and easily they can move through matter. Silicon can be used as an optical medium, which means that we can harness decades of chip fabrication technology to build circuits for light. This technology, known as silicon photonics, opens up enormous possibilities in both the fields of communication and computation. One key contribution in Shen’s paper was the experimental demonstration of ideas from Reck and Miller that a common component of photonic circuits, known as a Mach-Zehnder inferometer (MZI), can be configured to perform a 2x2 matrix multiplication between quantities related to the phases of two light beams; moreover, they implemented Reck’s recipe for arranging these small matrix multiplications in a triangular mesh to create larger matrices. The end result is a photonic circuit that implements a matrix-vector multiplication—a core computation in deep learning.
As in any manufacturing process, there are imperfections, which means that there will be small variations within and across chips, and these will affect the accuracy of computations. In order to move ONNs closer to production, we wanted to understand how sensitive they were to typical process variations, especially as they scaled up to more realistic problem sizes. We also wanted to know whether we could make them more robust to these variations by considering different circuit architectures.
In a newly published paper, we considered two architectures for building an optical neural network engine out of MZIs. One of them, which we called GridNet, arranges the MZIs in a grid; the other, which we called FFTNet, arranges the MZIs in a butterfly-like pattern modelled after architectures for computing Fast Fourier Transforms (but in our case the weights are learned from data, so the computation will not, in general, be an actual FFT). We then trained these two architectures in a software simulation on a benchmark deep learning task of handwritten digit recognition (MNIST). We found that in the case of double-precision floating point accuracy, GridNet achieved higher accuracy than FFTNet (~98% vs ~95%). However, we found that FFTNet was significantly more robust to manufacturing imprecision, which we simulated by adding noise to the amount of phase-shifting and transmittance of each MZI. After setting these noise levels to realistic levels, GridNet’s performance fell below 50% while FFTNet’s remained nearly constant.
The Power of Scale
If ONNs are to become a viable piece of the AI hardware ecosystem, they will need to scale up to larger circuits and industrial manufacturing techniques. Our finding addresses both of these issues. Larger circuits will require more devices, such as MZIs, per chip. Therefore, attempting to “fine tune” each device on a chip post-manufacturing will be a growing challenge. A more scalable strategy will be to train ONNs in software, then mass produce circuits based on those parameters. Our results suggest that choosing the right architecture in advance can greatly increase the probability that the resulting circuits will achieve their desired performance even in the face of manufacturing variations.
We look forward to working further in this exciting area at the intersection of physics and AI. With decades of manufacturing and deep collaborations across the industry and academia, this kind of scalable research is something Intel is dedicated to supporting in order to push the industry forward.
To read the research team’s full report, see “Design of optical neural networks with component imprecisions” from the Optical Society of America.
Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. © Intel Corporation.