Intel® Advisor User Guide

ID 766448
Date 6/24/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Examine Not-Vectorized and Under-Vectorized Loops

Accuracy Level

Low

Enabled Analyses

Survey

Result Interpretation

After running the Vectorization and Code Insights perspective with Low accuracy, you get a basic vectorization report, which shows not-vectorized and under-vectorized loops, and other performance issues.

In the Survey report:

  1. Sort by the Self-Time and/or Total-Time column to find top time-consuming loops.


  2. Check whether your target loop or function is vector or scalar. Intel Advisor helps you to differentiate vector and scalar using the following icons:

    • - vectorized function

    • - vectorized loop

    • - scalar function

    • - scalar loop

  3. Use filters to hide the code sides that you do not want to tweak now: and

  4. Decide what loops or functions to investigate:

    • If loop/function is scalar

    • If loop/function is vectorized

If Loop/Function is Scalar

If the target loop/function is scalar ( or ), you need to understand why the compiler did not vectorize the loop/function.

Several reasons are possible:

NOTE:

See OpenMP* Pragmas Summary in the Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference for more information about the directives mentioned below.

Possible Reason To Confirm To Do

Assumed dependency

Refer to Why No Vectorization? column. Search for Vector dependence prevents vectorization issue.

Run the Dependencies analysis.

  • If no dependencies are found, force vectorization with the omp simd directive or provide other vectorization recommendations to compiler.

  • If dependencies are confirmed, resolve them, or move to the next loop.

Function call in the loop

Refer to Why No Vectorization? column. Search for issues:

  • Function call present

  • Indirect function call present

  • Serialized user function call present

For issue: Function call present, do one of the following:

  • Inline function into the loop.

  • Vectorize the function with the omp declare simd directive.

For issues Indirect function call present or Serialized user function call present, refer to guidelines in the Recommendations tab.

Compiler-assumed inefficient vectorization

Refer to Why No Vectorization? column. Search for the Loop vectorization possible but seems inefficient issue.

Try forcing vectorization with the omp simd directive.

If forcing vectorization doesn't provide tangible results, consider experimenting with other directives.

To better understand performance implications and potential speed-up, consider running additional analyses:

  • Trip Counts

  • Memory Access Patterns

Other

Refer to

  • Why No Vectorization? column

  • Vector Issues column

Study the Compiler Diagnostic Details and Advisor Recommendations to resolve the issues.

If Loop/Function is Vectorized

If the target loop is vectorized ( or ), ensure vector efficiency is above 90%.

If efficiency is below 90%, consider the following:

Possible Reason To Confirm To Do

ISA

Refer to Vectorized Loops/Vector ISA column to check the ISA version used in the application.

Change the target ISA by specifying corresponding compiler flags.

Inefficient peel/remainder

Refer to Vector Issues column. Search for the Inefficient Peel/Reminder issue. Or check if the time spent in peel/reminder is significant.

Resolve the issues:

  • Check Recommendations tab.

  • Run the Trip Counts analysis.

Possible inefficient memory access

Refer to Vector Issues column. Search for the Possible Inefficient Memory Access issue.

Refer to Instruction Set Analysis/Traits column. Search for the following traits:

  • extracts

  • inserts

  • gather

  • scatter

Run the Memory Access Patterns analysis.

Type conversions present

Refer to Instruction Set Analysis/Traits column. Search for the Type Conversions metric.

Remove redundant type conversions from float to double that might lead to smaller vector length and reduced vectorization efficiency.

Unaligned vector access in loop

Refer to Advanced/Vectorization Details column. Search for the Unaligned access in vector loop metric.

Align data.

Register pressure

Refer to Vector Issues column. Search for the Vector register spilling possible issue.

Resolve the issue by doing one of the following:

  • Decrease loop unroll factor.

  • Split the loop into smaller parts.

Potential underutilization of FMA instructions

Refer to Vector Issues column. Search for the Potential underutilization of FMA instructions issue.

Resolve the issue by doing one of the following:

  • Change the target ISA.

  • Explicitly enable FMA generation and vectorization.

Other

Refer to Vector Issues column.

Follow the Intel Advisor recommendations to resolve the issues.