Optimize Distributed Training and Inference for Intel® Data Centers
Subscribe Now
Stay in the know on all things CODE. Updates are delivered to your inbox.
Overview
The complexity of deep learning models is surging, warranting enhanced training and inference in distributed compute environments. This session focuses on the essential techniques to use with Intel® Data Center GPUs and CPUs to balance distributed AI workloads and meet data center challenges to improve advances in efficiency and performance.
Within the session, explore the Intel® Extension for PyTorch*, which optimizes neural network operations on Intel hardware, and learn how Microsoft DeepSpeed* can be integrated to perform training operations at scale.
The topics covered include:
- Tackle model scalability in a distributed environment skillfully, handling workloads efficiently across Intel Data Center GPUs and CPUs.
- Gain familiarity with essential tools from Intel to simplify operations, including PyTorch Distributed Data Parallel (DPP), Intel® openAPI Collective Communications Library (oneCCL), and the DeepSpeed library that streamlines network training at scale.
- Deploy practical solutions that maximize hardware efficiency and perfect strategies that ensure top performance for AI development.
- Sample code and see benchmarking milestones, using tools such as IPEX-LLM, to illustrate performance achievements.
Skill level: All skill levels
Get the Software
- Intel oneAPI Collective Communications Library
- Intel Extension for PyTorch from GitHub or AI Frameworks and Tools
- Intel® Extension for DeepSpeed*
Download Code Samples