Intel® Advisor User Guide

ID 766448
Date 3/31/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents
Give Feedback

Vectorization Recommendations for Fortran

Ineffective Peeled/Remainder Loop(s) Present

All or some source loop iterations are not executing in the loop body. Improve performance by moving source loop iterations from peeled/remainder loops to the loop body.

Align Data

One of the memory accesses in the source loop does not start at an optimally aligned address boundary. To fix: Align the data and tell the compiler the data is aligned. To align data, use __declspec(align()). To tell the compiler the data is aligned, use __assume_aligned() before the source loop.

See also:

Parallelize The Loop with Both Threads and SIMD Instructions

The loop is threaded and auto-vectorized; however, the trip count is not a multiple of vector length. To fix: Do all of the following:

  • Use the !$omp parallel do simd directive to parallelize the loop with both threads and SIMD instructions. Specifically, this directive divides loop iterations into chunks (subsets) and distributes the chunks among threads, then chunk iterations execute concurrently using SIMD instructions.
  • Add the schedule(simd: [kind]) modifier to the directive to guarantee the chunk size (number of iterations per chunk) is a multiple of vector length.

Original code sample:

!$omp parallel do schedule(static)
do i = 1,1000
    c(i) = a(i)*b(i)
end do
!$omp end parallel do

<