Accelerator Engines for All
The January 2023 launch of 4th gen Intel® Xeon® and Intel® Max Series processors not only delivered the most built-in accelerators of any CPU in the world,1 they delivered leadership performance for the most important computing challenges across AI, analytics, networking, security, storage, and high-performance computing (HPC).
The latest Intel® Accelerator Engines include:
- Intel® Advanced Matrix Extensions (Intel® AMX), which improves the performance of deep learning training and inference. It is ideal for workloads like natural language processing (NLP), recommendation systems, and image recognition.
- Intel® QuickAssist Technology (Intel® QAT), which helps free up processor cores by offloading encryption, decryption, and compression so systems can serve a larger number of clients or use less power. With Intel QAT, 4th gen Intel Xeon Scalable processors are the highest performance CPUs that can compress and encrypt in a single data flow.
- Intel® Data Streaming Accelerator (Intel® DSA), which drives high performance for storage, networking, and data-intensive workloads by improving streaming data movement and transformation operations. Designed to offload the most common data movement tasks that cause overhead in data center-scale deployments, Intel DSA helps speed up data movement across the CPU, memory, caches, all attached memory, storage, and network devices.
- Intel® In-Memory Analytics Accelerator (Intel® IAA), which helps run database and analytics workloads faster, with potentially greater power efficiency. This built-in accelerator increases query throughput and decreases the memory footprint for in-memory database and big data analytics workloads. Intel IAA is ideal for in-memory databases, open source databases, and data stores like RocksDB and ClickHouse*.
Developers can realize the full potential and value of these new built-in accelerator features and instruction sets with Intel oneAPI and AI tools—a complete set of advanced compilers, libraries, analysis and debug tools, and optimized frameworks that simplify the development and deployment of accelerated solutions.
Let’s look at how, breaking it down by workload type.
Accelerate machine learning and data science pipelines using the Intel® oneAPI Base Toolkit and Intel® AI Analytics Toolkit. Drive optimizations on orders of magnitude into industry-leading deep learning AI frameworks, including TensorFlow* and PyTorch*.
Deploy high-performance deep learning inference using the Intel® Distribution of OpenVINO™ toolkit powered by oneAPI for inference acceleration.
Developers can take full advantage of cutting-edge features of 4th gen Intel Xeon Scalable processors, such as:
Intel AMX now introduces new extensions to the x86 Instruction Set Architecture (ISA) to work on matrices that accelerate matrix multiplication in AI workloads. It consists of two components:
- A set of two-dimensional registers (tiles), which can hold submatrices from larger matrices in memory.
- An accelerator called Tile Matrix Multiply (TMUL), which contains instructions that operate on tiles.
Support for int8 and bfloat16 data types provide significant performance gains for AI machine learning workloads.
Activate Intel AMX and support for int8 and bflloat16 datatypes using Intel oneAPI performance libraries:
- Intel® oneAPI Deep Neural Network Library (oneDNN) is a highly flexible and scalable deep learning library that provides high performance on a variety of hardware platforms.
- Intel® oneAPI Data Analytics Library (oneDAL) helps speed up big data analysis in batch, online, and distributed processing modes of computation.
- Intel® oneAPI Collective Communications Library (oneCCL) is a library for collective communication primitives, such as allreduce and broadcast, that are widely used in deep learning and other high-performance computing domains.
- Intel® oneAPI Threading Building Blocks (oneTBB) is a widely used C++ library for parallel programming that provides a higher-level interface for parallel algorithms and data structures.
Deliver fast HPC applications that scale with techniques in vectorization, multithreading, multi-node parallelization, and memory optimization using Intel oneAPI Base and Intel® oneAPI HPC Toolkit.
Drive productivity with advanced analysis tools: Intel® Fortran Compiler, Intel® oneAPI Math Kernel Library (oneMKL), and Intel® VTune™ Profiler. These are available as stand-alone products or as part of the Intel® oneAPI Base & HPC Toolkit.
When building converged AI applications for HPC, 4th gen Intel Xeon Scalable processors, together with Intel oneAPI and AI tools, enable developers to deliver high-performance applications. These tools, open-standards-based software stacks, and AI frameworks are compatible with existing languages and programming models including C, C++, Python*, SYCL*, OpenMP*, Fortran, and MPI.
Accelerate end-to-end data science and analytics pipelines with the Intel oneAPI Base Toolkit and Intel AI Analytics Toolkit; the latter also helps you achieve drop-in acceleration with compute-intensive optimized Python packages, Modin*, scikit-learn*, and XGBoost.
Simplify the task of writing performant code for data science workloads using Intel compilers, libraries, analysis tools, and application frameworks together.
Additionally, developers can leverage Intel® Virtualization Technology (Intel® VT), supported by Intel Xeon processors, which provides hardware-assist to the virtualization software, reducing its size, cost, and complexity. Intel VT also helps reduce the virtualization overheads occurring in cache, I/O, and memory.
Database Management Systems (DBMS)
Intel IAA and Intel DSA offer great performance benefits to developers, in addition to lowering TCO in data centers.
- Intel® Query Processing Library (Intel® QPL) is targeted for developers who want to activate the capabilities of Intel IAA.
- Intel® Data Mover Library (Intel® DML) is targeted for developers who want to activate the power of Intel DSA.
Intel QPL high-compression and decompression capabilities are designed to help run database and analytics workloads faster. In addition, Intel QPL enables the Intel IAA accelerator to reduce the cost of computing, save memory bandwidth, and achieve higher query throughput for applications in big data and in-memory analytic databases, memory page compression, data integrity operations, and more.
Intel DML, which enables Intel DSA, helps reduce latency and increase memory-transfer performance by optimizing streaming data movement and transformation operations commonly used in storage, networking, and various data processing applications. It also reduces overhead by offloading the most common data movement tasks; for these, developers can take advantage of the library’s system solution capabilities to protect the communication path between a host and storage device, ensuring end-to-end integrity.
Storage & Cloud Computing
Optimize these workloads with high-performance libraries and tools such as oneDAL for data analytics and machine learning, and oneDNN for deep neural network acceleration and cloud service integration.
Using these tools and libraries, developers can harness the full power of 4th gen Intel Xeon processors for their cloud-based services workloads, resulting in improved performance and efficiency. On top of that, they can tap into the hardware-assist capability of Intel VT, reducing the size, cost, and complexity of the virtualization software, as well as reducing the virtualization overheads occurring in cache, I/O, and memory.
Accelerate compute and deliver high-fidelity visualization applications for scientific and medical research, cosmology, motion picture production, and more using advanced libraries in the Intel® oneAPI Rendering Toolkit.
Get cost-efficient interactive rendering. Access all system memory space for even the largest datasets. Enable deep learning-based denoising. And efficiently deploy across parallel processing architectures and platforms.
For edge computing, unleash AI, analytics, and visual inference with the Intel AI Analytics Toolkit and Intel Distribution of OpenVINO toolkit, powered by oneAPI. For IoT edge innovations, the Intel® oneAPI IoT Toolkit is tailored for developers focused on accelerating development of smart, connected devices.
Capitalize on the full complement of Intel-optimized frameworks, libraries, and tools available in the Intel AI Analytics Toolkit and Intel Distribution of OpenVINO toolkit to enable the full power of 4th gen Intel Xeon Scalable processor features—including Intel AMX, int8, and bfloat16—and speed up deep leaning inference and training workloads.
If your focus is on system design, development, and deployment across CPU, GPU, FPGA, and other accelerator architectures—from client and edge to cloud—use the build and analysis tools and libraries found in the Intel® oneAPI IoT Toolkit, which are enhanced for exactly those purposes.
The 4th gen Intel Xeon Scalable processors provide powerful computing capabilities, and Intel oneAPI and AI tools make software development for these processors easier and more efficient. With the ability to take advantage of advanced hardware features, compatibility with a range of programming languages and frameworks, and a comprehensive set of libraries and tools, oneAPI is a valuable tool for businesses and developers looking to optimize their applications for Intel Xeon processors. To learn more, visit Software for 4th Gen Intel Xeon and Intel Max Series Processors.
1 Intel Accelerator Engines Fact Sheet, January 10, 2023
Intel® oneAPI Base Toolkit
Get started with this core set of tools and libraries for developing high-performance, data-centric applications across diverse architectures.