New github repository greatly augments the Intel® 64 and IA-32 Architectures Optimization Reference Manual

ID 658315
Updated 6/11/2021
Version Latest



This blog is for hard-core programmers who want to really understand, and explore, very detailed opportunities for optimizing to a specific Intel CPU architecture and microarchitecture. All the code that we share to support learning these architectural details uses assembly code, or intrinsics, and are wrapped and accessed via C/C++.

Optimization Guide

Intel 64 and IA-32 Architectures Optimization Reference ManualFor many years, we have published and periodically updated our Intel® 64 and IA-32 Architectures Optimization Reference Manual to share tips and techniques for optimizing software for Intel CPUs. When you want to learn about the micro-architecture of Intel CPUs, on Intel Architecture (IA), and get tips on the best methods to maximize performance, the Optimization Manual is the document for you! In this manual, we share tips on how to use the Intel® DL Boost instructions, how to efficiently transpose a matrix, how to best to compute a histogram, and much more.

This manual is relied upon heavily by compiler and library writers; most of us enjoy the uplift from this manual because our tools are written by experts who embody these techniques in their software, and we all benefit. Of course, some of us find this manual directly valuable when we optimize our applications for ultimate performance. Call us crazy if you must, but we do love it.


We start with introductory chapters that provide an overview of the micro-architecture of modern Intel processors and general optimization guidelines. Remaining chapters each focus on one specific aspect of the Intel Instruction Set Architecture (ISA), such as Intel® Advanced Vector Extensions (Intel® AVX) and Intel® TSX (Intel® Transaction Synchronization Extensions).  These chapters are very useful, and we are confident you will find them approachable and quite readable We have them divided into several different sections, with each section presenting a problem and then describing one or more ways the problem can best be solved.

What really makes the Optimization Manual special is that it provides code samples for all the solutions it proposes. These code samples make the optimization techniques described in the manual easier for developers to understand and adopt. The document contains hundreds of these code samples, many written in assembler (the repo extends these with connections into a C program) and increasing numbers of examples written with intrinsics in C/C++.

Full Code Examples on github

snippet from transform_avx512.cBecause these code samples are so useful, we have recently invested in moving them to a github repository.  Each code sample in the repository is now wrapped in its own function, its input and output constraints are documented, and has its own dedicated unit test to verify its behavior. This represents a big upgrade in our samples, because previously the guide included only snippets that left it up to the reader to make it complete enough to use, and often had non-obvious subtle changes needed to meet the precise needs of various tool chains. 

In our new github repository, all the code samples in the repository compile with multiple compilers and on multiple operating systems. They are all released under the 0-Clause BSD open-source license making it super clear that they are very openly available for you to use, and they come without any warrantee or assumption of liability.

More to Come - Feedback Welcome

This is a work in progress, and we have a growing list of code that has been updated and made available on our new github repository. The majority of the code samples from Chapters 8 (Intel® DL Boost), 15 (Intel AVX), and 18 (Intel® AVX-512) have been added already. We will add new examples as they are updated and verified.

We hope that the next time you’re looking for an efficient way to deal with sparse vectors or are wondering how best to compute reciprocal square roots on the latest Intel CPUs, you take a look at our open and check out the code samples. We welcome your feedback and suggestions!

The project was a collaboration of several Intel engineers including Nicole S. O'Donnell, Mark D. Ryan, Stanislav Shwartsman, Laxman Sole, Gideon Stupp, and Bob Valentine.