For developers focused on deep learning use cases—predictive modeling, recommendation systems, natural language processing, object detection, and tons more—it is paramount to extract the most workload performance using newer technologies like bfloat16, graph-level optimizations, and custom kernels.
This session focuses on the performance and ease-of-use benefits for deep learning training and inference of large models like deep learning recommendation model (DLRM) using Intel® Extension for PyTorch* and Intel® oneAPI Deep Neural Network Library (oneDNN).
Join senior deep learning engineer, Eikan Wang to learn more about the following topics:
- Using oneDNN to deliver optimal training and inference workload performance for the PyTorch* framework on Intel hardware
- oneDNN-based graph optimizations and custom kernel implementations to boost performance of DLRM modules in PyTorch
- How the extension library for PyTorch can be dynamically loaded as a Python module to offer a more modular design for custom compound operations that are critical to accelerating key deep learning modules, for example, the interaction module from DLRM.