OS Thread Migration
- Application:a test OpenMP* application. The application is used as a demo and not available for download.
- Performance analysis tools:Intel® VTune™version 2018 or newer - Hotspots analysisProfiler
- Starting with the 2020 release, Intel® VTune™ Amplifier has been renamed toIntel® VTune™.Profiler
- Most recipes in theIntel® VTune™Performance Analysis Cookbook are flexible. You can apply them to different versions ofProfilerIntel® VTune™. In some cases, minor adjustments may be required.Profiler
- Operating system:Linux*, Ubuntu* 16.04 64-bit
- CPU:Intel® Core™ i7-6700K processor
Run Advanced Hotspots Analysis
Identify Thread Migration
To identify thread migration using the GUI, select the
Expand core nodes to see the number of software threads. In general, you need the total number of threads to be less than or equal to the total number of hardware threads supported by the CPU. In addition to this, you need the threads to be equally distributed across the cores. If you see more than the expected number of software threads under any core in your result, there is a thread migration occurring in your application. In the above example, there are 12 OpenMP* worker threads instead of 2 threads (since this is an Intel® Xeon® processor supporting Intel® Hyper-Threading Technology), executing on core_8. This indicates thread migration.
Thread/H/W Contextgrouping to analyze thread migration the Timeline pane.
Expand the thread nodes to see the number of CPUs where this thread was executed and analyze thread execution over time. In the example above, OpenMP thread #0 was executing on cpu_23 and then migrated to cpu_47.
amplxe-cl -group-by thread,cpuid -report hotspots -r /temp/test/omp -s "H/W Context" -q | less
Thread H/W Context CPU Time:Self ------------------------------ ----------- ------------- OMP Worker Thread #5 (0x3d86) cpu_0 0.004 matmul-intel64 (0x3d52) cpu_1 0.013 OMP Worker Thread #15 (0x3d90) cpu_10 2.418 matmul-intel64 (0x3d52) cpu_10 2.023 OMP Worker Thread #8 (0x3d89) cpu_10 0.687 OMP Worker Thread #13 (0x3d8e) cpu_10 0.097 OMP Worker Thread #6 (0x3d87) cpu_10 0.065 OMP Worker Thread #4 (0x3d85) cpu_10 0.059 OMP Worker Thread #1 (0x3d82) cpu_10 0.048 OMP Worker Thread #9 (0x3d8a) cpu_10 0.034 OMP Worker Thread #11 (0x3d8c) cpu_10 0.009