In this paper, we explore FPGA miniﬂoat implementations (ﬂoating-point representations with non-standard exponent and mantissa sizes), and show the use of a block-ﬂoating point implementation that shares the exponent across many numbers, reducing the logic required to perform ﬂoating-point operations.
In this paper, we introduce a domain-specifc approach to overlays that leverages both software and hardware optimizations to achieve state-of-the-art performance on the FPGA for neural network acceleration.
This paper examines ﬂexibility, and its impact on FPGA design methodology, physical design tools and computer-aided design (CAD). We describe the degrees of ﬂexibility required to create efcient deep learning accelerators.
This white paper examines the future of deep neural networks, including sparse networks, low precision, and ultra-low precision, and compares the performance of Intel® Arria® 10 and Intel Stratix® 10 FPGAs against NVIDIA graphics processing units (GPUs).