Cause:
A function call inside the loop is preventing auto-vectorization.
Example:
Program foo
implicit none
integer, parameter :: nx = 100000000
real(8) :: x, xp, sumx
integer :: i
interface
real(8) function bar(x, xp)
real(8), intent(in) :: x, xp
end
end interface
sumx = 0.
xp = 1.
do i = 1,nx
x = 1.D-8*real(i,8)
sumx = sumx + bar(x,xp)
enddo
print *, 'Sum =',sumx
end
real(8) function bar(x, xp)
implicit none
real(8), intent(in) :: x, xp
bar = 1. - 2.*(x-xp) + 3.*(x-xp)**2 - 1.5*(x-xp)**3 + 0.2*(x-xp)**4
bar = bar / sqrt(x**2 + xp**2)
end
LOOP BEGIN at foo.f90(18,5)
remark #15543: loop was not vectorized: loop with function call not considered an optimization candidate. [ foo.f90(17,22) ]
LOOP END
Resolution:
real(8) function bar(x, xp)
!$OMP DECLARE SIMD (bar) UNIFORM(xp)
implicit none
real(8), intent(in) :: x, xp
bar = 1. - 2.*(x-xp) + 3.*(x-xp)**2 - 1.5*(x-xp)**3 + 0.2*(x-xp)**4
bar = bar / sqrt(x**2 + xp**2)
end
...
remark #15301: FUNCTION WAS VECTORIZED [ bar.f90(1,18) ]
remark #15344: loop was not vectorized: vector dependence prevents vectorization. First dependence is shown below. Use level 5 report for details
remark #15346: vector dependence: assumed OUTPUT dependence between line 17 and line 18
LOOP END
Program foo
implicit none
integer, parameter :: nx = 100000000
real(8) :: x, xp, sumx
integer :: i
interface
real(8) function bar(x, xp)
!$OMP DECLARE SIMD (bar) UNIFORM(xp)
real(8), intent(in) :: x, xp
end
end interface
sumx = 0.
xp = 1.
!$OMP SIMD private(x) reduction(+:sumx)
do i = 1,nx
x = 1.D-8*real(i,8)
sumx = sumx + bar(x,xp)
enddo
print *, 'Sum =',sumx
end
...
LOOP BEGIN at foo.f90(17,5)
remark #15301: OpenMP SIMD LOOP WAS VECTORIZED
LOOP END
The loop is now vectorized successfully; running and timing the program shows a speedup.
Note that if the DECLARE SIMD directive is omitted, the !$OMP SIMD directive will still cause the remaining parts of the loop in foo to be vectorized, but the call to bar() will be serialized, so any performance gain is likely to be small. In either case, the private and reduction clauses of this directive are mandatory; without them, the compiler will assume no loop-carried dependencies and results may be incorrect.
remark #15300: LOOP WAS VECTORIZED
LOOP END