Comparative Analysis of High-Level Synthesis Design Tools

Academic computing clusters and cloud-based systems, such as Amazon Web Services* (AWS*) and Google Cloud*, have been integrating high-end FPGAs for high-performance computing (HPC) into their ecosystems, making FPGAs available to a broader community. These platforms feature high-level synthesis (HLS) tools to enable developers to describe FPGA designs using familiar, high-level languages such as C/C++. As HLS tools continue to mature, it's critical to understand their capabilities to produce efficient FPGA designs.

One key domain of interest is state-of-the-art algorithms for machine learning, such as convolutional neural networks (CNNs), which are expensive in terms of memory and compute resources required for high-accuracy classification. By contrast, neuromorphic object-classification algorithms have lower memory and compute complexities than CNNs at similar accuracies, which can improve the scalability of machine learning applications.

This research explores and evaluates the efficacy of HLS design tools from Intel (Intel® SDK for OpenCL™ applications and Intel® oneAPI DPC++/C++ Compiler). Specifically, it looks at design latency and hardware resources on a case study featuring a novel, neuromorphic, machine-learning algorithm. Evaluated on both Intel® Stratix® 10 FPGA and Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA, oneAPI-based designs exhibited on average 10% lower latency while using significantly less available FPGA-board resources, thereby enabling faster, more scalable designs.

 

Speaker

Luke Kljucaric is a PhD student (predoctoral fellow) in computer and electrical engineering at the University of Pittsburgh and a lead student in the HPC group at National Science Foundation (NSF) Center for Space, High-Performance, and Resilient Computing (SHREC). His focus has been on FPGA and HPC research to better understand the capabilities of current FPGA design tools, with a specific emphasis on high-level design. The target application of his research is accelerated machine learning, which includes algorithms such as CNNs and neuromorphic classification algorithms studied on CPUs, GPUs, TPUs, VPUs, and FPGAs.