Intel® Advisor User Guide

ID 766448
Date 3/31/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents
Give Feedback

Vectorization Recommendations for C++

Ineffective Peeled/Remainder Loop(s) Present

All or some source loop iterations are not executing in the loop body. Improve performance by moving source loop iterations from peeled/remainder loops to the loop body.

Align Data

One of the memory accesses in the source loop does not start at an optimally aligned address boundary. To fix: Align the data and tell the compiler the data is aligned.

Align dynamic data using a 64-byte boundary and tell the compiler the data is aligned:

float *array;
array = (float *)_mm_malloc(ARRAY_SIZE*sizeof(float), 32);
// Somewhere else
__assume_aligned(array, 32);
// Use array in loop
_mm_free(array);

Align static data using a 64-byte boundary:

__declspec(align(64)) float array[ARRAY_SIZE]

See also:

Parallelize The Loop with Both Threads and SIMD Instructions

The loop is threaded and auto-vectorized; however, the trip count is not a multiple of vector length. To fix: Do all of the following:

  • Use the