A preview is not available for this record, please engage by choosing from the available options ‘download’ or ‘view’ to engage with the material
Description
This paper is the outcome of research collaboration between Intel and the Technical Uni. In Munich (TUM) to evaluate small language models like Meta’s Llama-2 7B model performance on Intel® Xeon® processors with using Intel AMX AI acceleration technology. By using post-training quantization methods specifically designed to leverage Intel AMX’s computational patterns, the study demonstrates substantial improvements in both inference speed and energy efficiency compared to standard CPU execution.