Prototype and Deploy LLM Applications on Intel® NPUs
Subscribe Now
Stay in the know on all things CODE. Updates are delivered to your inbox.
Overview
Model size plus limited hardware resources in client devices (for example, disk, RAM, or CPU) make it increasingly challenging to deploy large language models (LLM) on laptops compared to cloud-based solutions. The AI PC from Intel solves this issue by including a CPU, GPU, and NPU on one device.
This session focuses on the NPU and showcases how to prototype and deploy LLM applications locally. It also includes:
- How NPU architecture works, including features, advantages, and capabilities in accelerating neural network computations on Intel® Core™ Ultra processors (the backbone of AI PCs from Intel).
- Practical aspects of deploying performant LLM apps on Intel NPUs—from initial setup to optimization and system partitioning—using the OpenVINO™ toolkit and its NPU plug-in.
- What LLMs are, and advantages versus challenges of local inference.
- Fast LLM prototyping on Intel Core Ultra processors using the Intel® NPU Acceleration Library.
Get real-world examples and case studies (like chatbots and retrieval augmented generation [RAG]) that showcase the seamless integration of LLM applications with NPUs, including how this synergy can unlock performance and efficiency.
Skill level: All
Featured Software
You May Also Like
Related Articles