Vectorization Basics for Intel® Architecture Processors

OpenCL™ Developer Guide for Intel® Processor Graphics

Download PDF

ID 773088

Date 3/20/2019

Version 2019.4

Public

Visible to Intel only — GUID: GUID-549B8191-3FBD-4730-B137-AE7114F0AD4D

View Details

Vectorization Basics for Intel® Architecture Processors

Intel® Architecture Processors provide performance acceleration using Single Instruction Multiple Data (SIMD) instruction sets, which include:

Intel Streaming SIMD Extensions (Intel SSE)
Intel Advanced Vector Extensions (Intel AVX) instructions
Intel Advanced Vector Extensions 2 (Intel AVX2) instructions

By processing multiple data elements in a single instruction, these ISA extensions enable data parallelism in scientific, engineering, or graphics applications.

When using SIMD instructions, vector registers hold group of data elements of the same data type, such as float or char. The number of data elements that fit in one register depends on the microarchitecture, and on the data type width, for example: starting with the 2nd Generation Intel Core™ Processors, the vector register width is 256 bits. Each vector (YMM) register can store eight float numbers, eight 32-bit integer numbers, and so on.

When using the SPMD technique, the OpenCL™ standard implementation can map the work-items to the hardware according to:

Scalar code, when work-items execute one-by-one.
SIMD elements, when several work-items fit in one register to run simultaneously.

The OpenCL Code Builder contains an implicit vectorization module, which implements the method with SIMD elements. Depending on the kernel code, this operation might have some limitations. If the vectorization module optimization is disabled, the SDK uses the method with scalar code.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

OpenCL™ Developer Guide for Intel® Processor Graphics

Vectorization Basics for Intel® Architecture Processors

See Also