Prototype and Deploy LLM Applications on Intel NPUs

Prototype and Deploy LLM Applications on Intel® NPUs

Subscribe Now

Stay in the know on all things CODE. Updates are delivered to your inbox.

Overview

Model size plus limited hardware resources in client devices (for example, disk, RAM, or CPU) make it increasingly challenging to deploy large language models (LLM) on laptops compared to cloud-based solutions. The AI PC from Intel solves this issue by including a CPU, GPU, and NPU on one device.

This session focuses on the NPU and showcases how to prototype and deploy LLM applications locally. It also includes:

How NPU architecture works, including features, advantages, and capabilities in accelerating neural network computations on Intel® Core™ Ultra processors (the backbone of AI PCs from Intel).
Practical aspects of deploying performant LLM apps on Intel NPUs—from initial setup to optimization and system partitioning—using the OpenVINO™ toolkit and its NPU plug-in.
What LLMs are, and advantages versus challenges of local inference.
Fast LLM prototyping on Intel Core Ultra processors using the Intel® NPU Acceleration Library.

Get real-world examples and case studies (like chatbots and retrieval augmented generation [RAG]) that showcase the seamless integration of LLM applications with NPUs, including how this synergy can unlock performance and efficiency.

Skill level: All

Featured Software

Jump to:

You May Also Like

You May Also Like

Related Articles

What Is an AI PC?

Fine-Tune Stable Diffusion* Models on Intel CPUs

Hugging Face* & Intel Drive Practical, Faster, Ethical AI Solutions

<link rel="stylesheet" href="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/commons-page.min.css" type="text/css"><script src="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/commons-page.min.js" defer></script>

<link rel="preload" href="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/atomVideo.min.css" as="style"><link rel="stylesheet" href="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/atomVideo.min.css" type="text/css"><script src="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/atomVideo.min.js"></script>

<link rel="preload" href="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/colorBlock.min.css" as="style"><link rel="stylesheet" href="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/colorBlock.min.css" type="text/css">

<link rel="preload" href="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/contact-us.min.css" as="style"><link rel="stylesheet" href="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/contact-us.min.css" type="text/css">

<script>!function(){var e=setInterval(function(){"undefined"!=typeof $CQ&&($CQ(function(){CQ_Analytics.SegmentMgr.loadSegments("/etc/segmentation"),CQ_Analytics.ClientContextUtils.init("/etc/clientcontext/intel",window.location.pathname.substr(0,window.location.pathname.indexOf(".")))}),clearInterval(e))},100)}();</script>

<link rel="preload" as="style" href="/etc.clientlibs/settings/wcm/designs/intel/us/en/css/resources/css/intel.rwd.override.css"/>
<link rel="stylesheet" type="text/css" href="/etc.clientlibs/settings/wcm/designs/intel/us/en/css/resources/css/intel.rwd.override.css"/>