Diagnostic 15415 vectorization support: gather was generated for the variable a: indirect access

Published: 03/31/2016  

Last Updated: 03/31/2016

By Devorah Hayman

Product Version: Intel® Fortran Compiler 15.0 and a later version 

Cause:

A vectorizable loop contains loads from memory locations that are not contiguous in memory (sometimes known as a “gather”). These may be indexed loads, as in the example below, or loads with non-unit stride. The compiler has issued a hardware gather instruction for these loads.

(Note that for compiler versions 16.0.1 and earlier, the compiler may also emit this message when gather operations are emulated in software).

 

The vectorization report generated using Intel® Fortran Compiler's optimization and vectorization report options:

Windows* OS:  /O2  /Qopt-report:2  /Qopt-report-phase:vec    

Linux OS or OS X:  -O2 -qopt-report2  -qopt-report-phase=vec

Example:

An example below will generate the following  remark in optimization report:

subroutine gathr(n, a, b, index)
   implicit none
   integer,                intent(in)  :: n
   integer,  dimension(n), intent(in)  :: index
   real(RT), dimension(n), intent(in)  :: a
   real(RT), dimension(n), intent(out) :: b
   integer                             :: i

   do i=1,n
       b(i) = 1.0_RT + 0.1_RT*a(index(i))
   enddo

end subroutine gathr

$ ifort -c -xcore-avx2 -qopt-report=4 -qopt-report-file=stdout gathr.F90 -DRT=4 -S | egrep 'gather|VECTORIZED'

   remark #15415: vectorization support: gather was generated for the variable a:  indirect access    [ gathr.F90(10,29) ]

   remark #15300: LOOP WAS VECTORIZED

   remark #15458: masked indexed (or gather) loads: 1

   remark #15301: REMAINDER LOOP WAS VECTORIZED

$ egrep gather gathr.s

        vgatherdps %ymm4, -4(%r8,%ymm3,4), %ymm5                #10.29

        vgatherdps %ymm7, -4(%r8,%ymm6,4), %ymm8                #10.29

        vgatherdps %ymm3, -4(%r8,%ymm2,4), %ymm4                #10.29

$

The compiler has vectorized the loop using a “gather” instruction from Intel® Advanced Vector Extensions 2 (Intel® AVX2).

Compare to the behavior when compiling with -DRT=8  as described in the article for diagnostic #15328.

 

See also:

Requirements for Vectorizable Loops

Vectorization Essentials

Vectorization and Optimization Reports

Back to the list of vectorization diagnostics for Intel® Fortran

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.