Examine Not-Vectorized and Under-Vectorized Loops

Intel® Advisor User Guide

Download PDF

ID 766448

Date 12/16/2022

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-5EB0273E-F760-4446-A2AF-EB5E23AD5DE7

View Details

Examine Not-Vectorized and Under-Vectorized Loops

Accuracy Level

Low

Enabled Analyses

Survey

Result Interpretation

After running the Vectorization and Code Insights perspective with Low accuracy, you get a basic vectorization report, which shows not-vectorized and under-vectorized loops, and other performance issues.

In the Survey report:

Sort by the Self-Time and/or Total-Time column to find top time-consuming loops.
Check whether your target loop or function is vector or scalar. Intel Advisor helps you to differentiate vector and scalar using the following icons:
- - vectorized function
- - vectorized loop
- - scalar function
- - scalar loop
Use filters to hide the code sides that you do not want to tweak now: and
Decide what loops or functions to investigate:
- If loop/function is scalar
- If loop/function is vectorized

If Loop/Function is Scalar

If the target loop/function is scalar ( or ), you need to understand why the compiler did not vectorize the loop/function.

Several reasons are possible:

NOTE:

See OpenMP* Pragmas Summary in the Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference for more information about the directives mentioned below.

Possible Reason	To Confirm	To Do
Assumed dependency	Refer to Why No Vectorization? column. Search for Vector dependence prevents vectorization issue.	Run the Dependencies analysis. If no dependencies are found, force vectorization with the omp simd directive or provide other vectorization recommendations to compiler. If dependencies are confirmed, resolve them, or move to the next loop.
Function call in the loop	Refer to Why No Vectorization? column. Search for issues: Function call present Indirect function call present Serialized user function call present	For issue: Function call present, do one of the following: Inline function into the loop. Vectorize the function with the omp declare simd directive. For issues Indirect function call present or Serialized user function call present, refer to guidelines in the Recommendations tab.
Compiler-assumed inefficient vectorization	Refer to Why No Vectorization? column. Search for the Loop vectorization possible but seems inefficient issue.	Try forcing vectorization with the omp simd directive. If forcing vectorization doesn't provide tangible results, consider experimenting with other directives. To better understand performance implications and potential speed-up, consider running additional analyses: Trip Counts Memory Access Patterns
Other	Refer to Why No Vectorization? column Vector Issues column	Study the Compiler Diagnostic Details and Advisor Recommendations to resolve the issues.

If Loop/Function is Vectorized

If the target loop is vectorized ( or ), ensure vector efficiency is above 90%.

If efficiency is below 90%, consider the following:

Possible Reason	To Confirm	To Do
ISA	Refer to Vectorized Loops/Vector ISA column to check the ISA version used in the application.	Change the target ISA by specifying corresponding compiler flags.
Inefficient peel/remainder	Refer to Vector Issues column. Search for the Inefficient Peel/Reminder issue. Or check if the time spent in peel/reminder is significant.	Resolve the issues: Check Recommendations tab. Run the Trip Counts analysis.
Possible inefficient memory access	Refer to Vector Issues column. Search for the Possible Inefficient Memory Access issue. Refer to Instruction Set Analysis/Traits column. Search for the following traits: extracts inserts gather scatter	Run the Memory Access Patterns analysis.
Type conversions present	Refer to Instruction Set Analysis/Traits column. Search for the Type Conversions metric.	Remove redundant type conversions from float to double that might lead to smaller vector length and reduced vectorization efficiency.
Unaligned vector access in loop	Refer to Advanced/Vectorization Details column. Search for the Unaligned access in vector loop metric.	Align data.
Register pressure	Refer to Vector Issues column. Search for the Vector register spilling possible issue.	Resolve the issue by doing one of the following: Decrease loop unroll factor. Split the loop into smaller parts.
Potential underutilization of FMA instructions	Refer to Vector Issues column. Search for the Potential underutilization of FMA instructions issue.	Resolve the issue by doing one of the following: Change the target ISA. Explicitly enable FMA generation and vectorization.
Other	Refer to Vector Issues column.	Follow the Intel Advisor recommendations to resolve the issues.

Next Steps

Parent topic: Explore Vectorization and Code Insights Results

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® Advisor User Guide

Examine Not-Vectorized and Under-Vectorized Loops

Accuracy Level

Enabled Analyses

Result Interpretation

If Loop/Function is Scalar

If Loop/Function is Vectorized

Next Steps