Check How Assumed Dependencies Affect Modeling
If a loop has dependencies, it cannot be run in parallel and in most cases cannot be offloaded to the GPU.
Intel Advisor
can get the information about loop-carried dependencies from the following resources:
- Using Intel® Compiler diagnostics. The dependencies are found at the compile time for some loops and the diagnostics are passed to theIntel Advisorusing the integration with Intel Compilers.
- Parsing the application call stack tree. If a loop is parallelized or vectorized on a CPU or is already offloaded to a GPU but executed on a CPU,Intel Advisorassumes that you resolved the loop-carried dependencies before parallelizing or offloading the loop.
- Using the Dependencies analysis results. This analysis detects dependencies for most loops at run time, but a result might depend on an application workload. It also adds a high overhead making the application execute 5 - 100 times slower during the analysis. To reduce overhead, you can use various techniques, for example, mark up loops of interest.
For the
Offload Modeling
perspective. the Dependencies analysis is optional, but it might add important information about loop-carried dependencies
Intel® Advisor
to decide if a loop can be profitable to run on a graphics processing unit (GPU).
This topic describes a workflow that you can follow to understand if there are potential loop-carried dependencies in your code that might affect its performance on a target GPU.
Note
: In the commands below, make sure to replace the
myApplication
with your application executable path and name
before
executing a command. If your application requires additional command line options, add them
after
the executable name. Verify Assumed Dependencies
If you do not know what dependency types there are present in your application, run the
Offload Modeling
without the Dependencies analysis first to check if potential dependencies affect modeling results and to decide if you need to run the Dependencies analysis:
- Run theOffload Modelingwithout the Dependencies analysis.
- From GUI: Select Medium accuracy level and enable theAssume Dependenciesoption for the Performance Modeling in theAnalysis Workflowtab. Run the perspective.
- From CLI: Run the following analyses, for example, using theadvisorcommand line interface:advisor --collect=survey --project-dir=./advi_results --static-instruction-mix -- ./myApplicationadvisor --collect=tripcounts --project-dir=./advi_results --flop --stacks --enable-cache-simulation --target-device=xehpg_512xve --data-transfer=light -- ./myApplicationadvisor --collect=projection --project-dir=./advi_results
- Open the generated report and go to theAccelerated Regionstab.
- In theCode Regionspane, expand theMeasuredcolumn group and examine theDependency Typecolumn.
- Youdo notneed to run the Dependencies analysis for loops with the following dependency types:
- Parallel: Programming Modeldependency type means that the loop is uses Data Parallel C++, OpenCL™ or OpenMP*targetprogramming model.
- Parallel: Explicitdependency type means that the loop is threaded and vectorized on CPU (for example, with OpenMPparallel foror Intel® oneAPI Threading Building Blocksparallel for).
- Parallel: Provendependency type means that an Intel Compiler found no dependencies at the compile time.
- Youmightneed to run the Dependencies analysis for loops that have theDependency: Assumeddependency type. It means that theIntel Advisordoes not have information about loop-carried dependencies for these loops and do not consider them as offload candidates.
- If you see manyDependency: Assumedtypes, rerun the performance modeling with assumed dependencies ignored, as follows:
- From GUI: Selectonlythe Performance Modeling step in theAnalysis Workflowtab and disable theAssume Dependenciesoption. Run the perspective.
- From CLI: Run the Performance Modeling withoneof the following options
- Use--no-assume-dependenciesto ignore assumed dependencies forallloops/functions. For example:advisor --collect=projection --project-dir=./advi_results --no-assume-dependencies
- Use--set-parallel=[to ignore assumed dependencies for<loop-ID1>|<file-name1>:<line1>,<loop-ID2>|<file-name2>:<line2>,...]specificloops/functions only. Use this option if you know that some loops/functions have dependencies and you do not want to model them as parallel. For example:advisor --collect=projection --project-dir=./advi_results --set-parallel=foo.cpp:34,bar.cpp:192
- Review the results generated to check if the potential dependencies might block offloading to GPU.Loops that previously hadDependency: Assumeddependency type are now marked asParallel: Assumed.Intel Advisormodels their performance on the target GPU and checks potential offload profitability and speedup.
- Compare the program metrics calculated with and without assumed dependencies, such as speedup, number of offloads, and estimated accelerated time.
- If the difference is small, for example, 1.5x speedup with assumed dependencies and 1.6x speedup without assumed dependencies, you canskip the Dependencies analysisand rely on the current estimations. In this case, most loops with potential dependencies are not profitable to be offloaded and do not add much speedup to the application on the target GPU.
- If the difference is big, for example, 2x speedup with assumed dependencies and 40x speedup without assumed dependencies, you shouldrun the Dependencies analysis. In this case, the information about loop-carried dependencies is critical for correct performance estimation.
Run the Dependencies Analysis
To check for real dependencies in your code, run the Dependencies analysis and rerun the Performance Modeling to get more accurate estimations of your application performance on GPU:
- From GUI:
- Enable only the Dependencies and Performance Modeling analyses in theAnalysis Workflowtab.By default, the generic markup strategy is applied to select only potentially profitable loops to run the Dependencies analysis.
- Rerun the perspective with only these two analyses enabled.
- From CLI:
- Run the Dependencies analysis for potentially profitable loops only:advisor --collect=dependencies --select markup=gpu_generic --loop-call-count-limit=16 --filter-reductions --project-dir=./advi_results -- ./myApplication
- Run the Performance Modeling analysis:advisor --collect=projection --project-dir=./advi_results
Open the result in the
Intel Advisor
, view the interactive HTML report, or print it to the command line.
Continue to investigate the results and
identify code regions to offload.