Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 12/16/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

simd

Enforces vectorization of loops.

Syntax

#pragma simd [clause[ [,] clause]...]

Arguments

clause

Can be any of the following:

vectorlength (n1[, n2]...)

Where n is a vector length (VL). It must be an integer that is a power of 2; the value must be 2, 4, 8, or 16. If you specify more than one n, the vectorizor will choose the VL from the values specified.

Causes each iteration in the vector loop to execute the computation equivalent to n iterations of scalar loop execution. Multiple vectorlength clauses are merged as a union.

vectorlengthfor (data type)

Where data type must be one of built-in integer types (8-, 16-, 32-, or 64-bit), pointer types (treated as pointer-sized integer), floating point types (32- or 64-bit), or complex types (64- or 128-bit). Otherwise, behavior is undefined.

Causes each iteration in the vector loop to execute the computation equivalent to n iterations of scalar loop execution where n is computed from size_of_vector_register/sizeof(data type).

For example, vectorlengthfor(float) results in n=4 for Intel® Streaming SIMD Extensions (Intel® SSE2) to Intel SSE4.2 targets (packed float operations available on 128bit XMM registers) and n=8 for an Intel® Advanced Vector Extensions (Intel® AVX) target (packed float operations available on 256bit YMM registers). vectorlengthfor(int) results in n=4 for Intel SSE2 to Intel AVX targets.

vectorlength() and vectorlengthfor() clauses are mutually exclusive. In other words, the vectorlengthfor() clause may not be used with the vectorlength() clause, and vice versa.

Behavior for multiple vectorlengthfor clauses is undefined.

private (var1[, var2]...)

Where var is a scalar variable.

Causes each variable to be private to each iteration of a loop. Unless the variable appears in firstprivate clause, the initial value of the variable for the particular iteration is undefined. Unless the variable appears in lastprivate clause, the value of the variable upon exit of the loop is undefined. Multiple private clauses are merged as a union.

NOTE:

Execution of the SIMD loop with firtsprivate/lastprivate clauses may be different from serial execution of the same code even if the loop fails to vectorize.

A variable in a private clause cannot appear in a linear, reduction, firstprivate, or lastprivate clause.

firstprivate (var1[, var2]...)

Provides a superset of the functionality provided by the private clause. Variables that appear in a firstprivate list are subject to private clause semantics. In addition, its initial value is broadcast to all private instances for each iteration upon entering the SIMD loop.

A variable in a firstprivate clause can appear in a lastprivate clause.

A variable in a firstprivate clause cannot appear in a linear, reduction, or private clause.

lastprivate (var1[, var2]...)

Provides a superset of the functionality provided by the private clause. Variables that appear in a lastprivate list are subject to private clause semantics. In addition, when the SIMD loop is exited, each variable has the value that resulted from the sequentially last iteration of the SIMD loop (which may be undefined if the last iteration does not assign to the variable).

A variable in a lastprivate clause can appear in a firstprivate clause.

A variable in a lastprivate clause cannot appear in a linear, reduction, or private clause.

linear (var1:step1 [,var2:step2]...)

Where var is a scalar variable and step is a compile-time positive, integer constant expression.

For each iteration of a scalar loop, var1 is incremented by step1, var2 is incremented by step2, and so on. Therefore, every iteration of the vector loop increments the variables by VL*step1, VL*step2, …, to VL*stepN, respectively. If more than one step is specified for a var, a compile-time error occurs. Multiple linear clauses are merged as a union.

A variable in a linear clause cannot appear in a reduction, private, firstprivate, or lastprivate clause.

reduction (oper:var1 [,var2]…)

Where oper is a reduction operator and var is a scalar variable.

Applies the vector reduction indicated by oper to var1, var2, …, varN. The simd pragma may have multiple reduction clauses with the same or different operators. If more than one reduction operator is associated with a var, a compile-time error occurs.

A variable in a reduction clause cannot appear in a linear, private, firstprivate, or lastprivate clause.

[no]assert

Directs the compiler to assert or not to assert when the vectorization fails. The default is noassert. If this clause is specified more than once, a compile-time error occurs.

[no]vecremainder

Instructs the compiler to vectorize or not to vectorize the remainder loop when the original loop is vectorized. See the description of the vector pragma for more information.

Description

The simd pragma is used to guide the compiler to vectorize more loops. Vectorization using the simd pragma complements (but does not replace) the fully automatic approach.

Without explicit vectorlength() and vectorlengthfor() clauses, the compiler will choose a vectorlength using its own cost model. Misclassification of variables into private, firstprivate, lastprivate, linear, and reduction, or lack of appropriate classification of variables may cause unintended consequences such as runtime failures and/or incorrect result.

You can only specify a particular variable in at most one instance of a private, linear, or reduction clause.

If the compiler is unable to vectorize a loop, a warning will be emitted (use the assert clause to make it an error).

If the vectorizer has to stop vectorizing a loop for some reason, the fast floating-point model is used for the SIMD loop.

The vectorization performed on this loop by the simd pragma overrides any setting you may specify for options -fp-model (Linux* and macOS) and /fp (Windows*) for this loop.

Note that the simd pragma may not affect all auto-vectorizable loops. Some of these loops do not have a way to describe the SIMD vector semantics.

The following restrictions apply to the simd pragma:

  • The countable loop for the simd pragma has to conform to the for-loop style of an OpenMP worksharing loop construct. Additionally, the loop control variable must be a signed integer type.

  • The vector values must be signed 8-, 16-, 32-, or 64-bit integers, single or double-precision floating point numbers, or single or double-precision complex numbers.

  • A SIMD loop may contain another loop (for, while, do-while) in it. Goto out of such inner loops are not supported. Break and continue are supported.

    NOTE:
    Inlining can create such an inner loop, which may not be obvious at the source level.

  • A SIMD loop performs memory references unconditionally. Therefore, all address computations must result in valid memory addresses, even though such locations may not be accessed if the loop is executed sequentially.

To disable transformations that enables more vectorization, specify the -vec -no-simd (Linux* and macOS) or /Qvec /Qno-simd (Windows*) options.

User-mandated vectorization, also called SIMD vectorization can assert or not assert an error if a #pragma simd annotated loop fails to vectorize. By default, the simd pragma is set to noassert, and the compiler will issue a warning if the loop fails to vectorize. To direct the compiler to assert an error when the #pragma simd annotated loop fails to vectorize, add the assert clause to the simd pragma. If a simd pragma annotated loop is not vectorized by the compiler, the loop holds its serial semantics.

Examples

This example shows how to use the simd pragma:

 void add_floats(float *a, float *b, float *c, float *d, float *e, int n){
  int i; 
#pragma simd
  for (i=0; i<n; i++){
    a[i] = a[i] + b[i] + c[i] + d[i] + e[i];
  } 
}

In the example, the function add_floats() uses too many unknown pointers for the compiler's automatic runtime independence check optimization to kick-in. The programmer can enforce the vectorization of this loop by using the simd pragma to avoid the overhead of runtime check.