Use Intel® Inspector and Intel® VTune™ Profiler
Intel® Advisor helps you:
Discover where to add parallelism to your program by identifying where your program spends its time. You propose parallel code regions when you annotate the parallel sites and tasks.
Predict the performance you might achieve with the proposed parallel code regions.
Predict the data sharing problems that could occur in the proposed parallel code regions.
Intel Advisor does not catch all problems, and it cannot ensure that you have correctly implemented the parallelism. Before deploying your parallel program, you need to test it for Dependencies and verify its performance. To do this, you can use analyzer tools provided in the Intel® oneAPI Base Toolkit, Intel® oneAPI IoT Toolkit, and Intel® oneAPI HPC Toolkit.
The thread error analysis provided by the Intel® Inspector and the Dependencies analysis provided by the Intel Advisor use similar technology. Intel Inspector includes a data race and deadlock detection tool that works on the parallel code. It can find more errors because it operates on the parallel code instead of working on the annotated serial code analyzed by the Dependencies tool. Intel Inspector also can find problems with memory: memory leaks, references to freed storage, references to uninitialized memory, and so forth. The memory-checking tool works on serial or parallel code.
Similarly, the Intel Advisor Survey and Suitability tools provide features found in the Intel® VTune™ Profiler. The Survey tool profiles your program to find hotspots and the Suitability tool makes predictions of approximate parallel performance including overhead costs based on the Intel Advisor annotations. When you have a working parallel program, you should use Intel VTune Profiler to measure the parallel program gain and core utilization, as well as check whether the parallel framework overhead is acceptable.
Once you have parallel code, you should:
Measure the speedup.
Make adjustments if locks are causing excessive delays, or if one task runs much longer than others.
Intel VTune Profiler has many features to help you find and fix performance problems in your parallel code. It also helps you check:
Where are the hotspots now?
Am I missing opportunities for more parallelism?
Is my program spending a lot of time waiting?
How does the performance compare to that of prior versions?
Another technique is to use a debugger to debug a serial version of your parallel program with the parallel constructs in reverse order (see Debug Parallel Programs).