Intel® Advisor User Guide

ID 766448
Date 7/13/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents
Give Feedback

Examine Not-Vectorized and Under-Vectorized Loops

Accuracy Level

Low

Enabled Analyses

Survey

Result Interpretation

After running the Vectorization and Code Insights perspective with Low accuracy, you get a basic vectorization report, which shows not-vectorized and under-vectorized loops, and other performance issues.

In the Survey report:

  1. Sort by the Self-Time and/or Total-Time column to find top time-consuming loops.


  2. Check whether your target loop or function is vector or scalar. Intel Advisor helps you to differentiate vector and scalar using the following icons:

    • - vectorized function

    • - vectorized loop

    • - scalar function

    • - scalar loop

  3. Use filters to hide the code sides that you do not want to tweak now: and

  4. Decide what loops or functions to investigate:

    • If loop/function is scalar

    • If loop/function is vectorized

If Loop/Function is Scalar

If the target loop/function is scalar ( or ), you need to understand why the compiler did not vectorize the loop/function.

Several reasons are possible:

NOTE:

See OpenMP* Pragmas Summary in the Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference for more information about the directives mentioned below.

Possible Reason To Confirm To Do

Assumed dependency

Refer to Why No Vectorization? column. Search for Vector dependence prevents vectorization issue.

Run the Dependencies analysis.

  • If no dependencies are found, force vectorization with the omp simd directive or provide other vectorization recommendations to compiler.

  • If dependencies are confirmed, resolve them, or move to the next loop.

Function call in the loop

Refer to Why No Vectorization? column. Search for issues:

  • Function call present

  • Indirect function call present

  • Serialized user function call present

For issue: Function call present, do one of the following:

  • Inline function into the loop.

  • Vectorize the function with the omp declare simd directive.

For issues Indirect function call present or Serialized user function call present, refer to guidelines in the Recommendations tab.

Compiler-assumed inefficient vectorization

Refer to Why No Vectorization? column. Search for the Loop vectorization possible but seems inefficient issue.

Try forcing vectorization with the omp simd directive.

If forcing vectorization doesn't provide tangible results, consider experimenting with other directives.

To better understand performance implications and potential speed-up, consider running additional analyses:

  • Trip Counts

  • Memory Access Patterns

Other

Refer to

  • Why No Vectorization? column

  • Vector Issues column

Study the Compiler Diagnostic Details and Advisor Recommendations to resolve the issues.

If Loop/Function is Vectorized

If the target loop is vectorized ( or ), ensure vector efficiency is above 90%.

If efficiency is below 90%, consider the following:

Possible Reason