Efficient and Sustainable LLMs on Intel® Xeon® with Intel® AMX

Download

ID 866175

Date 2025-10-23

Public

Description

This paper is the outcome of research collaboration between Intel and the Technical Uni. In Munich (TUM) to evaluate small language models like Meta’s Llama-2 7B model performance on Intel® Xeon® processors with using Intel AMX AI acceleration technology. By using post-training quantization methods specifically designed to leverage Intel AMX’s computational patterns, the study demonstrates substantial improvements in both inference speed and energy efficiency compared to standard CPU execution.

Usage instructions

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Efficient and Sustainable LLMs on Intel® Xeon® with Intel® AMX

Description

Usage instructions