Analyze Vector Instruction Set with
Set Up Analysis
- Use the-toption for Application Performance Snapshot to view the MPI Time per Rank data.$ aps-report stat_second -t
- Find the rank with the lowest MPI Time value. In this example, it is process number 7.
Run the Collection
- Set up the environment for theIntel VTune:Profiler$ source<vtune_installdir>/-vars.shvtunewhereis the installed location of<vtune_installdir>Intel VTune(default location isProfiler/opt/intel/vtune_)._profiler<version>
- Launch the application usingVTuneand the appropriate rank number.Profiler$ export OMP_NUM_THREADS=64$ mpirun -n 16 -ppn 2 -f hosts.txt -gtool "-collect hpc-performance -data-limit=0 -r result_init:7" ./heart_demo -m ../mesh_mid -s ../setup_mid.txt -i -t 50vtuneReplace the rank number in the second command with the rank identified in the previous section. In this example command, the rank value is 7.The following options are included in the command:
The application launches and performance data collection begins. The data collection stops as soon as the application completes and the collected data is saved in a result file.
- -gtooloption is used to launch tools such asIntel VTune(Profiler) on specified ranks. Additional information about the option is available from the Intel® MPI Library Developer Reference for Linux* OS at https://software.intel.com/en-us/mpi-developer-reference-linux.vtune
- is anvtuneIntel VTunecommand line interface with the following options used to run the analysis:Profiler
- -collectoption specifies the analysis type being run on the application. Additional information about the option is available from theIntel VTunehelp at https://software.intel.com/en-us/amplifier_help_linux.Profiler
- -data-limitoption is used to disable the size limit for result files when set to 0.
- -roption specifies the name and location of the results file.
View and Analyze the Results
- After running the performance analysis, launchIntel VTuneand open the result file using the following command:Profiler$-gui result_init.<host>/result_init.<host>.vtune&vtune
- Start analysis with theSummarywindow. Hover over the question mark icons to read the pop-up help and better understand what each performance metric means.
- Notice that theSIMD Instructions per Cyclesection indicates that the application could have better vectorization. TheVector Instruction Setcolumn shows that the vector instruction set values are outdated (AVX, SSE). The same information can be seen in theBottom-upwindow.
Rebuild Application with New Instruction Set
Check Application Performance