Inference Engine for Custom Neural Networks with oneAPI

ID 659611
Updated 2/22/2021
Version Latest
Public

author-image

By

Introduction

This project focused on developing an inference engine for custom network architectures that could be integrated in the software stack of the CMS experiment on the Large Hadron Collider. Based on oneAPI concepts, such an inference engine would enable deployment on different kinds of hardware, in line with what’s required by the future heterogeneous computing environment of the experiment, both online and offline.

I was responsible for the development of a oneAPI-based inference engine for the hls4ml framework, which is a package for machine learning inference, originally designed for FPGAs. This engine, implemented as a back end of hls4ml, could be used to accelerate inference on Intel® x86 architecture, significantly broadening the scope of hls4ml and enabling it to run on common x86 servers, such as the ones used in the high-level trigger (HLT) of the CMS detector for high-energy physics analysis.

About oneAPI and Intel® DevCloud

I used a variety of Intel technologies in this project, including Intel® oneAPI Toolkits—specifically, Intel® oneAPI Deep Neural Network Library (oneDNN) and Intel® DevCloud—to develop, test, and analyze my algorithms. The oneAPI programming model comes with some obvious benefits, but there were also some challenges such as memory abstraction, mapping mathematical equations to oneDNN primitives, and separation of network compilation and execution.

Results So Far

After my short eight-week journey, I was able to create a oneAPI back end and fully integrate it into the hls4ml framework. I used oneDNN primitives to implement support for common neural network layers. Primitives supported in the hls4ml oneAPI back end include:

  1. Dense layers (inner product primitive)
  2. Activation functions such as: Rectified Linear Unit (ReLU), tanh, logarithmic (or log), linear, exponential (or exp), SQRT, and many others (element-wise primitives)
  3. Softmax primitive
  4. Convolution primitive
  5. Pooling primitives (max pooling, average pooling)

I’ve added a possibility for processing data in batches using the oneAPI back end, which is unavailable in the hls4ml using other back ends.

So far, our measurements have shown that the oneAPI back end offered a 165x performance increase over hls4ml for CPUs using systems based on Intel® Xeon® Gold 6128 processors (3.4GHz). In our best use case for batch size 256, the oneAPI back end offered a 3,763x performance increase.

Conclusion

Overall, the oneAPI programming model can speed up inference time by hundreds of times on CPUs. oneDNN primitives can be used to design custom neural networks, such as graph NN. Although I focused on the development of an inference engine for x86 CPUs, this back end is not limited to them. oneAPI provides abstraction and accelerated libraries, which means the software can be ported to other architectures, such as Intel® Xe architecture and Intel® FPGAs, as well as deep learning accelerators without significant changes.

Additional Resources

Project Details and Final Presentation
Technical Details, Project Results, and Supported Primitives
Intel® DevCloud 

About Marcin Świniarski

Marcin Świniarski participated in CERN openlab 2020 online Summer internship program. He is also now a deep learning software engineer at Intel. For the past two years, Marcin has been passionate about enhancing his knowledge in deep learning. Currently, he’s involved in projects connected to reinforcement learning and computer vision in the Gradient Science Club. In his free time, he practices various sports including street workout. Connect with him on LinkedIn*.