Enabling and Optimizing Intel Advanced Matrix Extensions (Intel® AMX) on Intel® Xeon® Processors

Content Type: Product Information & Documentation | Article ID: 000101065 | Last Reviewed: 04/22/2025

Environment

4th or 5th Gen Intel Xeon Scalable or Intel® Xeon® 6 processors

Intel® Advanced Matrix Extensions (Intel® AMX) is a built-in accelerator designed to significantly enhance AI capabilities on Intel Xeon Scalable processors. Available in 4th, 5th Gen Intel® Xeon® Scalable processors and Intel® Xeon® 6, Intel AMX is specifically designed to improve the performance of deep learning training and inference on the CPU. It is ideal for workloads such as natural language processing (NLP), recommendation systems, image recognition, and other AI applications.

Overview of Intel AMX:

Purpose: Intel® AMX improves deep learning training and inference performance on CPUs, ideal for AI workloads.
Functionality: Introduces new instructions for matrix multiplication operations, crucial for AI and ML tasks, using dedicated tiles and matrix multiply instructions (TMUL).
Performance Gains: Delivers up to 10x better performance compared to previous generations, with significant improvements in AI workload performance per watt.

Enabling Intel® AMX on Intel® Xeon Processors:

System Requirements: Ensure your system includes 4th or 5th Gen Intel® Xeon® Scalable or Intel® Xeon® 6 processors.

BIOS Settings:

Enter BIOS setup.
Locate settings related to Intel® AMX or processor features.
Enable Intel® AMX in BIOS, possibly labeled as Intel Advanced Matrix Extensions.

Operating System and Driver Support:

Update your operating system and drivers to support Intel® AMX.
Install specific drivers or libraries from Intel's website or your system manufacturer.

Software Support:

Utilize Intel's oneAPI Deep Neural Network Library (oneDNN), optimized for Intel® AMX.
Integrate Intel® AMX-specific libraries and frameworks from Intel’s developer resources.

Development Tools:

Access Intel's AI tuning guide for 4th or 5th Gen Intel Xeon Scalable or Intel® Xeon® 6 processors
Use Intel® AMX quick-start guide for information and links to Intel-optimized AI libraries and frameworks.

Example Use Cases:

NLP: Up to 9.9x higher real-time NLP inference performance using BF16 versus 3rd Gen Intel® Xeon® Scalable processors using FP32 for BERT-Large workloads on PyTorch.
Image Classification: Up to 4.5x higher training performance using BF16 versus 3rd Gen Intel® Xeon® Scalable processors using FP32 for ResNet50v1.5 workloads.
Recommendation Systems: Up to 8.7x higher batch recommendation system inference performance using BF16 versus 3rd Gen Intel® Xeon® Scalable processors using FP32 for DLRM workloads on PyTorch.

Additional Resources:

Intel® AMX Tuning Guide: Detailed instructions for system tuning to maximize Intel® AMX benefits.
Intel® AMX QuickStart Guide: Basic steps to start using Intel® AMX.
Intel Developer Zone: Access tools, libraries, and support for optimizing AI workloads using Intel® AMX.

By following these steps and utilizing the available resources, you can enable and optimize Intel® AMX on your Intel® Xeon® processors to harness its full potential for AI workloads.

Enabling and Optimizing Intel Advanced Matrix Extensions (Intel® AMX) on Intel® Xeon® Processors

Environment

Related Information

Related Products

Need more help?