Understanding CPU Dispatching in the Intel® IPP Libraries

Introduction

Intel® Integrated Performance Primitives (Intel® IPP) is a cross-architecture software library that provides a broad range of library functions for image processing, signal processing, data compression, cryptography, and computer vision, as well as math support routines for such processing capabilities. Intel IPP is optimized for the wide range of Intel microprocessors.

One of the key advantages within Intel IPP is performance. The performance advantage comes through per processor architecture optimized functions, compiled into one single library. Intel IPP functions are “dispatched” at run-time. The “dispatcher” chooses which of these processor-specific optimized libraries to use when the application makes a call into the Intel IPP library. This is done to maximize each function’s use of the underlying vector instructions and other architecture-specific features.

This paper covers CPU dispatching of the Intel IPP library in more detail. After reading this article you will understand how CPU dispatching works and which libraries are needed for which processor architecture. Further documentation on Intel IPP can be found at Intel® Integrated Performance Primitives – Documentation.

Dispatcher

Dispatching refers to the process of detecting CPU features at run-time and then selecting the Intel IPP optimized library set that corresponds to your CPU. For example, in the <ipp directory>\ia32\ipp directory, the ippip8.dll library file contains the 32-bit optimized image processing libraries for processors with Intel® SSE4.2; ‘ippi’ refers to the image processing library, ‘p8’ refers to 32-bit SSE4.2 architecture.

Note: You can build custom processor-specific libraries that do not require the dispatcher, but that is outside thescope of this article. Please read this IPP linkage models article for information on how to build custom versions of the Intel IPP library.

In the general case, the “dispatcher” identifies the run-time processor only once, at library initialization time. It sets an internal table or variable that directs your calls to the internal functions that match your architecture. For example, ippsCopy_8u(), may have multiple implementations stored in the library, with each version optimized to a specific Intel® processor architecture. Thus, the p8_ippsCopy_8u() version of ippsCopy_8u() is called by dispatcher when running on an Intel processor with Intel® SSE4.2 on IA-32, because it is optimized for this processor architecture.

Note: Intel IPP architectures generally correspond to SIMD (MMX, SSE, AES, etc.) instruction sets.

Initializing the Intel® IPP Dispatcher

The process of identifying the specific processor being used, and initialization of the dispatcher, should be performed before making any calls into the Intel IPP library. If you are using a dynamic link library this process is handled automatically when the dynamic link library is initialized. However, if you are using a static library you must perform this step manually. See this article on the ipp*Init*() functions for more information on how to do this.

The following table lists all the architecture codes defined by the Intel IPP library through version 8.2 of the product. Note that some of these Intel IPP architectures have been deprecated and are no longer supported in the current version of the product. Deprecated architectures are identified in the “Notes” column of the table.

IA-32 Intel® architecture	Intel® 64 architecture	Meaning
px	mx	Generic code optimized for processors with Intel® Streaming SIMD Extensions (Intel® SSE)
w7	-	Optimized for processors with Intel SSE2
s8	n8	Optimized for processors with Supplemental Streaming SIMD Extensions 3 (SSSE3)
-	m7	Optimized for processors with Intel SSE3
p8	y8	Optimized for processors with Intel SSE4.2
g9	e9	Optimized for processors with Intel® Advanced Vector Extensions (Intel® AVX) and Intel® Advanced Encryption Standard New Instructions (Intel® AES-NI)
h9	l9	Optimized for processors with Intel® Advanced Vector Extensions 2 (Intel® AVX2)
-	k0	Optimized for processors with Intel® Advanced Vector Extensions 512 (Intel® AVX-512)
	n0	Optimized for processors with Intel® Advanced Vector Extensions 512 (Intel® AVX-512) for Intel® Many Integrated Core Architecture (Intel® MIC Architecture)

Table 1: CPU Identification Codes Associated with Processor-Specific Libraries

For non-Intel based processors support, please see the article titled Use Intel® IPP on Intel or Compatible AMD* Processors.

P8/Y8 Internal Run-Time Dispatcher

Within the 32-bit 'p8' and equivalent 64-bit 'y8' architectures there is an additional "run-time" dispatching mechanism, a kind of mini-dispatcher. The Nehalem (Intel® Core™ i7) and Westmere processor families add additional SIMD instructions beyond those defined by SSE4.1. The Nehalem processor family adds the SSE4.2 SIMD instructions and the Westmere family adds AES-NI.

Creating two additional internal versions of the Intel IPP library for the SSE4.2 and AES-NI instructions would be very space inefficient, so they are bundled as part of the SSE4.1 library. When you call a function that includes, for example, AES-NI optimizations, an additional jump directs your call to the AES-NI version within the p8/y8 library. Because the enhancements affect the optimization of only a small number of Intel IPP functions, this additional overhead occurs infrequently and only when your application is executing on a p8/y8 architecture processor.

Processor Architecture Table

The following table was copied from an Intel® Compiler Options for Intel® SSE and Intel® AVX generation (SSE2, SSE3, SSSE3, ATOM_SSSE3, SSE4.1, SSE4.2, ATOM_SSE4.2, AVX, AVX2, AVX-512) and processor-specific optimizations article describing some compiler architecture options. It contains a list of Intel processors showing which processors support which vector instructions. For the latest table please refer to the original article; it gets updated on a regular basis. Please note that the behavior of the Intel Compiler SIMD dispatcher described in that article does not apply to the Intel IPP library.

Note: The Intel IPP library dispatching mechanism behaves different than the one in the Intel Compiler products, and may also behave different than other Intel library products.

Additional information regarding dispatching and how it relates to non-Intel processors can be found here. How to identify your specific processor is described here. To correlate a processor family name with an Intel CPU brand name, use the ark.intel.com web site.

COMMON-AVX512	A future Intel® Processor.
MIC-AVX512	The Intel® Xeon Phi™ processor x200 product family.
CORE-AVX512	A future Intel® Processor
CORE-AVX2	4th Generation Intel® Core™ Processors 5th Generation Intel® Core™ Processors 6th Generation Intel® Core™ Processors Intel® Xeon® Processor E7 v3 Family Intel® Xeon® Processor E5 v3 Family Intel® Xeon® Processor E3 v3 Family Intel® Xeon® Processor E7 v4 Family Intel® Xeon® Processor E5 v4 Family Intel® Xeon® Processor E3 v4 Family
CORE-AVX-I	3rd Generation Intel® Core™ i7 Processors 3rd Generation Intel® Core™ i5 Processors 3rd Generation Intel® Core™ i3 Processors Intel® Xeon® Processor E7 v2 Family Intel® Xeon® Processor E5 v2 Family Intel® Xeon® Processor E3 v2 Family
AVX	2nd Generation Intel® Core™ i7 Processors 2nd Generation Intel® Core™ i5 Processors 2nd Generation Intel® Core™ i3 Processors Intel® Xeon® Processor E5 Family Intel® Xeon® Processor E3 Family
SSE4.2	Previous Generation Intel® Core™ i7 Processors Previous Generation Intel® Core™ i5 Processors Previous Generation Intel® Core™ i3 Processors Intel® Xeon® 55XX series Intel® Xeon® 56XX series Intel® Xeon® 75XX series Intel® Xeon® Processor E7 Family
ATOM_SSE4.2	Intel® Atom™ processors that support Intel® SSE4.2 instructions.
SSE4.1	Intel® Xeon® 74XX series Quad-Core Intel® Xeon 54XX, 33XX series Dual-Core Intel® Xeon 52XX, 31XX series Intel® Core™ 2 Extreme 9XXX series Intel® Core™ 2 Quad 9XXX series Intel® Core™ 2 Duo 8XXX series Intel® Core™ 2 Duo E7200
SSSE3	Quad-Core Intel® Xeon® 73XX, 53XX, 32XX series Dual-Core Intel® Xeon® 72XX, 53XX, 51XX, 30XX series Intel® Core™ 2 Extreme 7XXX, 6XXX series Intel® Core™ 2 Quad 6XXX series Intel® Core™ 2 Duo 7XXX (except E7200), 6XXX, 5XXX, 4XXX series Intel® Core™ 2 Solo 2XXX series Intel® Pentium® dual-core processor E2XXX, T23XX series
ATOM_SSSE3	Intel® Atom™ processors
SSE3	Dual-Core Intel® Xeon® 70XX, 71XX, 50XX Series Dual-Core Intel® Xeon® processor (ULV and LV) 1.66, 2.0, 2.16 Dual-Core Intel® Xeon® 2.8 Intel® Xeon® processors with SSE3 instruction set support Intel® Core™ Duo Intel® Core™ Solo Intel® Pentium® dual-core processor T21XX, T20XX series Intel® Pentium® processor Extreme Edition Intel® Pentium® D Intel® Pentium® 4 processors with SSE3 instruction set support
SSE2	Intel® Xeon® processors Intel® Pentium® 4 processors Intel® Pentium® M
IA32	Intel® Pentium® III Processor Intel® Pentium® II Processor Intel® Pentium® Processor

Table 2: Intel Processors Associated with Specific CPU Vector Instructions

* Other names and brands may be claimed as the property of others.

Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Understanding CPU Dispatching in the Intel® IPP Libraries

Introduction

Dispatcher

Initializing the Intel® IPP Dispatcher

Processor Architecture Table

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Understanding CPU Dispatching in the Intel® IPP Libraries

Introduction

Dispatcher

Initializing the Intel® IPP Dispatcher

Processor Architecture Table

Product and Performance Information