Product Version: Intel® Fortran Compiler 15.0 and a later version
A vectorizable loop contains loads from memory locations that are not contiguous in memory (sometimes known as a “gather”). These may be indexed loads, as in the example below, or loads with non-unit stride. The compiler has emulated a hardware gather instruction by issuing individual loads for the different memory locations in software.
The vectorization report generated using Intel® Fortran Compiler's optimization and vectorization report options:
Windows* OS: /O2 /Qopt-report:2 /Qopt-report-phase:vec
Linux OS or OS X: -O2 -qopt-report2 -qopt-report-phase=vec
An example below will generate the following remark in optimization report:
subroutine gathr(n, a, b, index) implicit none integer, intent(in) :: n integer, dimension(n), intent(in) :: index real(RT), dimension(n), intent(in) :: a real(RT), dimension(n), intent(out) :: b integer :: i do i=1,n b(i) = 1.0_RT + 0.1_RT*a(index(i)) enddo end subroutine gathr
$ ifort -c -xcore-avx2 -qopt-report=4 -qopt-report-file=stdout gathr.F90 -DRT=8 -S | egrep 'gather|VECTORIZED'
remark #15328: vectorization support: gather was emulated for the variable a: indirect access [ gathr.F90(10,29) ]
remark #15300: LOOP WAS VECTORIZED
remark #15458: masked indexed (or gather) loads: 1
remark #15301: REMAINDER LOOP WAS VECTORIZED
The compiler has vectorized the loop by emulating a “gather” instruction in software.
The assembly code contains no gather instructions.
Compare to the behavior when compiling with -DRT=4 as described in the article for diagnostic #15415.
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.