Diagnostic 15543: Loop Was Not Vectorized: Loop With Function Call Not Considered An Optimization Candidate. (Fortran)

ID 732974
Updated 1/7/2015
Version Latest
Public

author-image

By

 

Cause:

A function call inside the loop is preventing auto-vectorization.

Example:

Program foo 
    implicit none
    integer, parameter  :: nx = 100000000
    real(8)             :: x, xp, sumx
    integer             :: i
    interface
       real(8) function bar(x, xp) 
          real(8), intent(in) :: x, xp
       end
    end interface 
  
    sumx = 0.
    xp   = 1.

    do i = 1,nx
       x = 1.D-8*real(i,8)
       sumx = sumx + bar(x,xp)
    enddo

    print *, 'Sum =',sumx
      
end


real(8) function bar(x, xp) 
  implicit none
  real(8), intent(in) :: x, xp

  bar = 1. - 2.*(x-xp) + 3.*(x-xp)**2 - 1.5*(x-xp)**3  + 0.2*(x-xp)**4
  bar = bar / sqrt(x**2 + xp**2)
  
end  
> ifort -qopt-report-phase=vec -qopt-report-file=stderr  bar.f90 foo.f90
(  ifort /Qopt-report-phase:vec /Qopt-report-file:stderr  bar.f90 foo.f90  on  Windows*)
 
Non-optimizable loops:
 

LOOP BEGIN at foo.f90(18,5)
   remark #15543: loop was not vectorized: loop with function call not considered an optimization candidate.   [ foo.f90(17,22) ]
LOOP END

 

Resolution:

The loop and function call can be vectorized using the explicit vector programming capabilities of OpenMP 4.0 or Intel® Cilk™ Plus.
For example, adding an OpenMP DECLARE SIMD directive to the function bar() and compiling with -qopenmp-simd allows the compiler to generate a SIMD (vectorized) version of  bar() as well as a scalar version.  The same OpenMP directive must be added to the interface block for bar() inside program foo. The UNIFORM clause specifies that xp is a non-varying argument, i.e., it has the same value for each iteration of the loop in the caller that is being vectorized; thus x is the only vector argument. Without UNIFORM, the compiler would have to take account that xp could also be a vector argument.
 
real(8) function bar(x, xp) 
!$OMP DECLARE SIMD (bar) UNIFORM(xp)
  implicit none
  real(8), intent(in) :: x, xp

  bar = 1. - 2.*(x-xp) + 3.*(x-xp)**2 - 1.5*(x-xp)**3  + 0.2*(x-xp)**4
  bar = bar / sqrt(x**2 + xp**2)
  
end  

> ifort -qopenmp-simd -qopt-report-phase=vec -qopt-report-file=stderr bar.f90 foo.f90
...

remark #15301: FUNCTION WAS VECTORIZED   [ bar.f90(1,18) ]

Begin optimization report for: FOO
...
LOOP BEGIN at foo.f90(16,5)

   remark #15344: loop was not vectorized: vector dependence prevents vectorization. First dependence is shown below. Use level 5 report for details
   remark #15346: vector dependence: assumed OUTPUT dependence between  line 17 and  line 18
LOOP END

A vectorized version (actually, two) of function bar() has been generated; however, the loop inside foo has still not been vectorized. This is because the compiler sees dependencies between loop iterations carried by both x and sumx. The compiler could figure out unaided how to autovectorize a loop with just these dependencies, or a loop with just the function call, but not everything at once. We can instruct the compiler to vectorize the loop by providing a SIMD directive that specifies the properties of x and sumx:
 
Program foo 
    implicit none
    integer, parameter  :: nx = 100000000
    real(8)             :: x, xp, sumx
    integer             :: i

    interface
       real(8) function bar(x, xp) 
       !$OMP DECLARE SIMD (bar) UNIFORM(xp)
          real(8), intent(in) :: x, xp
       end
    end interface 
  
    sumx = 0.
    xp   = 1. 

    !$OMP SIMD  private(x)  reduction(+:sumx)
    do i = 1,nx
       x = 1.D-8*real(i,8)
       sumx = sumx + bar(x,xp)
    enddo

    print *, 'Sum =',sumx
      
end  
> ifort -qopenmp-simd -qopt-report-phase=vec -qopt-report-file=stderr bar.f90 foo.f90
...
remark #15301: FUNCTION WAS VECTORIZED   [ bar.f90(1,18) ]
...

LOOP BEGIN at foo.f90(17,5)
   remark #15301: OpenMP SIMD LOOP WAS VECTORIZED
LOOP END

The loop is now vectorized successfully; running and timing the program shows a speedup.

Note that if the DECLARE SIMD directive is omitted, the !$OMP SIMD directive will still cause the remaining parts of the loop in foo to be vectorized, but the call to bar() will be serialized, so any performance gain is likely to be small. In either case, the private and reduction clauses of this directive are mandatory; without them, the compiler will assume no loop-carried dependencies and results may be incorrect.

For small functions such as bar(), inlining may be a simpler and more efficient way to achieve vectorization of loops containing function calls. When the caller and callee are in separate source files, as above, the application should be built with interprocedural optimization (-ipo or /Qipo). When caller and callee are in the same source file, inlining of small functions is enabled by default at optimization levels of -O2 and above.
 
ifort -ipo -qopt-report-phase=vec -qopt-report-file=stderr  bar.f90 foo.f90
...
LOOP BEGIN at foo.f90(17,5)
   remark #15300: LOOP WAS VECTORIZED
LOOP END
 
 

Back to the list of vectorization diagnostics for Intel Fortran