Intel® Fortran Compiler Classic and Intel® Fortran Compiler Developer Guide and Reference

ID 767251
Date 3/22/2024
Public
Document Table of Contents

VECTOR and NOVECTOR

General Compiler Directive: Overrides default heuristics for vectorization of DO loops. It can also affect certain optimizations.

Syntax

!DIR$ VECTOR [clause[[,] clause]...]

!DIR$ NOVECTOR

clause

Is an optional vectorization or optimizer clause. It can be one or more of the following:

  • ALIGNED | UNALIGNED

    Specifies that all data is aligned or no data is aligned in a DO loop. These clauses override efficiency heuristics in the optimizer.

    The ALIGNED clause instructs the compiler to assume all array references are aligned. As a result, misalignment properties analyzed from the program context, if any, will be discarded. The UNALIGNED clause is a hint to the compiler to avoid alignment optimizations; the compiler can still use available alignment information. These clauses disable advanced alignment optimizations of the compiler, such as dynamic or static loop peeling to make references aligned.

    Be careful when using the ALIGNED clause. The compiler may choose to use aligned data movement instructions, instructions with memory operands that require alignment, and/or nontemporal stores. Such instructions will cause a runtime exception if some of the access patterns are actually unaligned.

  • ALWAYS [ASSERT]

    Enables or disables vectorization of a DO loop.

    The ALWAYS clause overrides efficiency heuristics of the vectorizer, but it only works if the loop can actually be vectorized. If the ASSERT keyword is added, the compiler will generate an error-level assertion message saying that the compiler efficiency heuristics indicate that the loop cannot be vectorized. You should use the IVDEP directive to ignore assumed dependences.

  • DYNAMIC_ALIGN [(var)] | NODYNAMIC_ALIGN]

    Enables or disables dynamic alignment optimization. Dynamic alignment is an optimization the compiler attempts to perform by default. It involves peeling iterations from the vector loop into a scalar loop before the vector loop so that the vector loop aligns with a particular memory reference.

    If you specify (var) for DYNAMIC_ALIGN, you can indicate a scalar or array variable name on which to align. Specifying DYNAMIC_ALIGN, with or without (var) does not guarantee the optimization is performed; the compiler still uses heuristics to determine feasibility of the optimization. The NODYNAMIC_ALIGN clause disables the optimization for the loop.

  • MASK_READWRITE | NOMASK READWRITE

    Enables or disables the generation of masked load and store operations within conditional statements.

    The MASK_READWRITE clause directs the compiler to disable memory speculation, causing the generation of masked load and store operations within conditional statements. The NOMASK_READWRITE clause directs the compiler to enable memory speculation, causing the generation of unmasked loads and stores within conditional statements.

  • [NO]MULTIPLE_GATHER_SCATTER_BY_SHUFFLES (can also be specified as [NO]G2S, which is deprecated)

    Enables or disables the optimization for multiple adjacent gathers/scatters.

    The MULTIPLE_GATHER_SCATTER_BY_SHUFFLES clause is an optimization hint to encourage use of unit-strided loads/stores plus a set of shuffles instead of multiple gathers/scatters (or straightforward gather/scatter emulation software sequences). The NOMULTIPLE_GATHER_SCATTER_BY_SHUFFLES clause tells the compiler not to try optimizing multiple gathers/scatters into unit-strided loads/stores plus a set of shuffles.

  • TEMPORAL | NONTEMPORAL [(var1 [, var2]...)]

    var

    Is an optional memory reference in the form of a variable name.

    Controls how the "stores" of register contents to memory are performed (streaming versus non-streaming).

    The TEMPORAL clause directs the compiler to use temporal (that is, non-streaming) stores. The NONTEMPORAL clause directs the compiler to use non-temporal (that is, streaming) stores.

    By default, the compiler automatically determines whether a streaming store should be used for each variable.

    Streaming stores may cause significant performance improvements over non-streaming stores for large numbers on certain processors. However, the misuse of streaming stores can significantly degrade performance.

  • [NO]VECREMAINDER

  • VECTORLENGTH (n1 [, n2]…)

    Tells the vectorizor which vector length/factor to use when generating the main vector loop. n is an integer power of 2; the value must be 2, 4, 8, 16, 32, or 64. If more than one value is specified, the vectorizor will choose one of the specified vector lengths based on a cost model decision.

The VECTOR and NOVECTOR directives control vectorization of the DO loop that directly follows the directive.

If the MASK_READWRITE clause is specified, the compiler generates masked loads and stores within all conditional branches in the loop. If the NOMASK_READWRITE clause is specified, the compiler generates unmasked loads and stores for increased performance.

CAUTION:

The VECTOR directive should be used with care. Overriding the efficiency heuristics of the compiler should only be done if you are absolutely sure the vectorization will improve performance.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Notice revision #20201201

Example

The compiler normally does not vectorize DO loops that have a large number of non-unit stride references (compared to the number of unit stride references).

In the following example, vectorization would be disabled by default, but the directive overrides this behavior:

!DIR$ VECTOR ALWAYS
  do i = 1, 100, 2
    ! two references with stride 2 follow
    a(i) = b(i)
  enddo

There may be cases where you want to explicitly avoid vectorization of a loop; for example, if vectorization would result in a performance regression rather than an improvement. In these cases, you can use the NOVECTOR directive to disable vectorization of the loop.

In the following example, vectorization would be performed by default, but the directive overrides this behavior:

!DIR$ NOVECTOR
  do i = 1, 100
    a(i) = b(i) + c(i)
  enddo