​​Deploy the New LFM2-24B-A2B Model On Intel Hardware With OpenVINO™ Toolkit

author-image

By

Today, we are excited to announce preview support for the LFM2-24B-A2B, the latest high-performance model from Liquid AI, across Intel® AI PC, edge devices, and data center platforms. 

​​​LFM2-24B-A2B scales the LFM2 hybrid architecture to its largest Mixture of Experts model yet.​​​​​ While the model contains 24B total parameters, its sparse architecture only activates 2.3B parameters at a time. These approaches make the model incredibly fast and hardware-aware, ​perfectly suited​​ for the​ low-memory,​ high-throughput, ​and ​low-latency requirements of ​agentic applications. Liquid AI shared more details about the architecture, training approach, and performance benchmarks in their launch announcement blogs - Technical overview blog and their Intel partnership.

​​Where does this model shine? The LFM2-24B-A2B​ delivers exceptionally fast prefill and decode speeds at low memory cost.​ The model ​excels at ​instruction following, ​agentic workflows, and complex multi-step mathematical reasoning. The model is an ideal engine for agentic tool use for on-device ​assistant, offline document summarization and Q&A, multi-turn support conversations with tool access (database lookups, ticket creation),​ and RAG pipelines, all ​without data leaving the network.​​​ 

OpenVINO™ support is critical for MoE models because only a subset of parameters (the “experts”) is active at any time. This sparsity is efficient but can be hard to optimize on general-purpose hardware. OpenVINO™ ensures CPU and ​​​​GPU resources are fully leveraged, maximizing throughput and reducing latency, and enables these models to run efficiently on AI PCs with Intel® Core™ Ultra Series 3 processors. Combined with Liquid AI’s MoE architecture, developers can deploy powerful,​ capable models while keeping memory usage ​low​, even for ​long-context tasks.

We benchmarked the inference latency of the LFM2-24B-A2B on an AI PC powered by Intel® Core™ Ultra X7 358H processor with a built-in Intel® Arc™ graphics using the OpenVINO™ toolkit for performance optimization. OpenVINO helps accelerate AI inference while enabling improved throughput and accuracy.

 

Figure 1. LFM2-24B-A2B throughput on Intel® Core™ Ultra X7 Processor 358H GPU, OpenVINO 2026.1.0-21125 and Precision: INT4

Demo

While the LFM2-24B-A2B is designed for high-performance ​instruction following tasks, this demo shows how it handles a simple question, highlighting the model’s efficiency and responsiveness when running on Intel® Core™ Ultra Series 3 processors with OpenVINO™ optimizations. 

Summary 

With the optimized LFM2-24B-A2B model for Intel architecture, developers can build powerful on-device AI experiences that are not only fast and responsive but also personalized and private. 

To get started with the model, please visit the Hugging Face model card. To learn more about OpenVINO and to download the toolkit, please visit: openvino.ai.

System Configuration Information

Intel® Core™ Ultra X7 Processor 368H Processor Configuration:   

  • Laptop Model: Intel CSRD (Reference Design)
  • CPU: Intel® Core™  Ultra 7 368H processor @ 2 GHz, Memory: 64GB LPDDR5-8535 MT/s, Storage: 1TB
  • OS: Ubuntu 25.10, kernel Version: 6.18.1-061801-generic 
  • Graphics: Intel Arc 140T GPU
  •  Power Plan: Balanced, Power Mode: Best Performance  
  • Package Power Limit (W): 65, Short Duration Power Limit (W): 70
  • Key Software Version: openvino 2026.1.0 -dev20260224 (lfm2-moe-support),  openvino-genai 2026.1.0.0 - dev20260224, openvino-tokenizers  2026.1.0.0 - dev20260224.

Benchmarked on February 24th, 2026.  

1