• 2019 Update 4
  • 03/20/2019
  • Public Content

Vectorization Basics for Intel® Architecture Processors

Intel® Architecture Processors provide performance acceleration using Single Instruction Multiple Data (SIMD) instruction sets, which include:
  • Intel Streaming SIMD Extensions (Intel SSE)
  • Intel Advanced Vector Extensions (Intel AVX) instructions
  • Intel Advanced Vector Extensions 2 (Intel AVX2) instructions
By processing multiple data elements in a single instruction, these ISA extensions enable data parallelism in scientific, engineering, or graphics applications.
When using SIMD instructions, vector registers hold group of data elements of the same data type, such as
. The number of data elements that fit in one register depends on the microarchitecture, and on the data type width, for example: starting with the 2nd Generation Intel Core™ Processors, the vector register width is 256 bits. Each vector (YMM) register can store eight
numbers, eight 32-bit
numbers, and so on.
When using the SPMD technique, the OpenCL™ standard implementation can map the work-items to the hardware according to:
  • Scalar code, when work-items execute one-by-one.
  • SIMD elements, when several work-items fit in one register to run simultaneously.
The OpenCL Code Builder contains an implicit vectorization module, which implements the method with SIMD elements. Depending on the kernel code, this operation might have some limitations. If the vectorization module optimization is disabled, the SDK uses the method with scalar code.
See Also

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.