Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference

ID 767253
Date 11/07/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

C++ Classes and SIMD Operations

Use of C++ classes for SIMD operations allows for operating on arrays or vectors of data in a single operation. Consider the addition of two vectors, A and B, where each vector contains four elements. Using an integer vector class, the elements A[i] and B[i] from each array are summed in the typical method of adding elements using a loop example snippet below.

int a[4], b[4], c[4]; 
for (i=0; i<4; i++) /* needs four iterations */ 
c[i] = a[i] + b[i]; /* computes c[0], c[1], c[2], c[3] */

The following example shows the same results using one operation with an integer class, showing the SIMD method of adding elements using Ivec classes.

Is16vec4 ivecA, ivecB, ivec C; /*needs one iteration*/ 
ivecC = ivecA + ivecB; /*computes ivecC0, ivecC1, ivecC2, ivecC3 */

Available Classes

The C++ SIMD classes provide parallelism, which is not easily implemented using typical mechanisms of C++. The following table shows how the C++ classes use the SIMD classes and libraries.

SIMD Vector Classes

Instruction Set

Class

Signedness

Data Type

Size

Elements

Header File

MMX™ Technology

I64vec1

unspecified

__m64

64

1

ivec.h

I32vec2

unspecified

int

32

2

ivec.h

Is32vec2

signed

int

32

2

ivec.h

Iu32vec2

unsigned

int

32

2

ivec.h

I16vec4

unspecified

short

16

4

ivec.h

Is16vec4

signed

short

16

4

ivec.h

Iu16vec4

unsigned

short

16

4

ivec.h

I8vec8

unspecified

char

8

8

ivec.h

Is8vec8

signed

char

8

8

ivec.h

Iu8vec8

unsigned

char

8

8

ivec.h

Intel® Streaming SIMD Extensions (Intel® SSE)

F32vec4

unspecified

float

32

4

fvec.h

F32vec1

unspecified

float

32

1

fvec.h

Intel® Streaming SIMD Extensions 2 (Intel® SSE2)

F64vec2

unspecified

double

64

2

dvec.h

I128vec1

unspecified

__m128i

128

1

dvec.h

I64vec2

unspecified

long int

64

2

dvec.h

I32vec4

unspecified

int

32

4

dvec.h

Is32vec4

signed

int

32

4

dvec.h

Iu32vec4

unsigned

int

32

4

dvec.h

I16vec8

unspecified

int

16

8

dvec.h

Is16vec8

signed

int

16

8

dvec.h

Iu16vec8

unsigned

int

16

8

dvec.h

I8vec16

unspecified

char

8

16

dvec.h

Is8vec16

signed

char

8

16

dvec.h

Iu8vec16

unsigned

char

8

16

dvec.h

Intel® Advanced Vector Extensions (Intel® AVX)

F32vec8

unspecified

float

32

8

dvec.h

F64vec4

unspecified

double

64

4

dvec.h

Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Foundation

F32vec16

unspecified

float

32

16

dvec.h

F64vec8

unspecified

double

64

8

dvec.h

M512vec

unspecified

__m512i

512

1

dvec.h

I32vec16

unspecified

int

32

16

dvec.h

Is32vec16

signed

int

32

16

dvec.h

Iu32vec16

unsigned

int

32

16

dvec.h

I64vec8

unspecified

long int

64

8

dvec.h

Is64vec8

signed

long int

64

8

dvec.h

Iu64vec8

unsigned

long int

64

8

dvec.h

Intel® AVX-512 Byte and Word Instructions (BWI)

I16vec32

unspecified

int

16

32

dvec.h

Is16vec32

signed

int

16

32

dvec.h

Iu16vec32

unsigned

int

16

32

dvec.h

I8vec64

unspecified

int

8

64

dvec.h

Is8vec64

signed

int

8

64

dvec.h

Iu8vec64

unsigned

int

8

64

dvec.h

Most classes contain similar functionality for all data types and are represented by all available intrinsics. However, some capabilities do not translate from one data type to another without suffering from poor performance, and are therefore excluded from individual classes.

NOTE:
Intrinsics that take immediate values and cannot be expressed easily in classes are not implemented. For example:
  • _mm_shuffle_ps
  • _mm_shuffle_pi16
  • _mm_shuffle_ps
  • _mm_extract_pi16
  • _mm_insert_pi16

Access to Classes Using Header Files

The required class header files are installed in the include directory with the Intel® oneAPI DPC++/C++ Compiler. To enable the classes, use the #include directive in your program file as shown in the table that follows.

Include Directives for Enabling Classes

Instruction Set Extension

Include Directive

MMX™ Technology

#include <ivec.h>

Intel® SSE

#include <fvec.h>

Intel® SSE2

#include <dvec.h>

Intel® Streaming SIMD Extensions 3 (Intel® SSE3)

#include <dvec.h>

Intel® Streaming SIMD Extensions 4 (Intel® SSE4)

#include <dvec.h>

Intel® AVX

#include <dvec.h>

Each succeeding file from the top down includes the preceding class. You only need to include fvec.h if you want to use both the Ivec and Fvec classes. Similarly, to use all the classes including those for Intel® SSE2, you only need to include the dvec.h file.

Usage Precautions

When using the C++ classes, you should follow some general guidelines. More detailed usage rules for each class are listed in Integer Vector Classes, and Floating-point Vector Classes.

Clear MMX™ Technology Registers

If you use both the Ivec and Fvec classes at the same time, your program could mix MMX™ Technology instructions, called by Ivec classes, with Intel® architecture floating-point instructions, called by Fvec classes. x87 floating-point instructions exist in the following Fvec functions:

  • fvec constructors

  • debug functions (cout and element access)

  • rsqrt_nr

NOTE:
MMX™ Technology registers are aliased on the floating-point registers, so you should clear the MMX™ Technology state with the EMMS instruction intrinsic before issuing an x87 floating-point instruction.
Example Usage

ivecA = ivecA & ivecB;

An Ivec logical operation that uses MMX™ Technology instructions.

empty ();

Creates a clear state.

cout << f32vec4a;

A F32vec4 operation that uses x87 floating-point instructions.

CAUTION:
Failure to clear the MMX™ Technology registers can result in incorrect execution or poor performance due to an incorrect register state.