Intel® Advisor User Guide

ID 766448
Date 3/31/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Parallelize Data - OpenMP Loops with Complex Iteration Control

Sometimes the loop control is spread across complex control flow. Using OpenMP in this situation requires more features than the simple loops. The task body must not access any of the auto variables defined within the annotation site, because they may have been destroyed before or while the task is running. Also, note that variables referenced within the task construct default to firstprivate.

Consider this serial C/C++ code:

extern char a[];
int previousEnd = -1;
 ANNOTATE_SITE_BEGIN(sitename);
   for (int i=0; i<=100; i++) {
      if (!a[i] || i==100) {
         ANNOTATE_TASK_BEGIN(do_something);
             DoSomething(previousEnd+1,i);
         ANNOTATE_TASK_END();
         previousEnd=i;
      }
   }
ANNOTATE_SITE_END();

This is done using the OpenMP task pragma. Without using this feature, such loops are extremely difficult to parallelize. One approach to the adding parallelism to the loop is to simply spawn each call to DoSomething():

 
extern char a[]; 
int previousEnd = -1;
#pragma omp parallel
 {
#pragma omp single
  {
... 
   for (int i=0; i<=100; i++) {
   
     if (!a[i] || i==100) 
     {
     #pragma omp task 
          DoSomething(previousEnd+1,i);
     }
   }
 }
}

It is important that the parameters to DoSomething be passed by value, not by reference, because previousEnd and i can change before or while the spawned task runs.

Consider this serial Fortran code:

. . .
logical(1) a(200)
integer(4) i, previousEnd
...
previousEnd=0
call annotate_site_begin(functions)
do i=1,101
  if a(.not. a(i)) .or. (i .eq. 101) then
  call annotate_task_begin(do_something)
    call DoSomething(previousEnd+1, i)
  call annotate_task_end
  endif
end do
call annotate_site_end

This is easily done using the OpenMP task directive. Without using this feature, such loops are extremely difficult to parallelize. One approach to the parallelize the above loop is simply to spawn each call to DoSomething():

 
. . .
logical(1) a(200)
integer(4) i, previousEnd
...
previousEnd=0
!$omp parallel
!$omp single
do i=1,101
  if a(.not. a(i)) .or. (i .eq. 101) then
  !$omp task
     call DoSomething(previousEnd+1, i)
  !$omp end task
  endif
end do
!$omp end parallel
   

There is no requirement that the omp task pragma or directive be within the surrounding parallel directive's static extent.

TIP:
After you rewrite your code to use OpenMP* parallel framework, you can analyze its performance with Intel® Advisor perspectives. Use the Vectorization and Code Insights perspective to analyze how well you OpenMP code is vectorized or use the Offload Modeling perspective to model its performance on a GPU.