Summary
You have completed the Finding Common Bottlenecks tutorial. Here are some important things to remember when using the Intel® VTune™ Profiler to analyze your code for hotspots and hardware issues:
Step |
Tutorial Recap |
Key Tutorial Takeaways |
---|---|---|
1. Find the bottleneck |
You started with Performance Snapshot to determine main limiting factors and next steps for optimization:
|
|
2. Resolve issue and recompile application |
You edited the code and recompiled the application to eliminate the cache-unfriendly DRAM access pattern. This has resulted in a great decrease of application running time. You've set compiler options to use a different optimization level to see how compiler options can influence vectorization. |
|
3. Resolve vectorization issues |
You recompiled the application with a different optimization level, and the code was vectorized. However, while using Performance Snapshot, you've noticed that only the 128-bit vector registers were utilized, while the 256-bit registers were not utilized at all. By using the HPC Performance Characterization analysis, you've noticed that the vector instruction set extension SSE2 was used, which is an older instruction set extension. A portion of hardware resources remained underutilized. You've recompiled the application again with different options to ensure vectorization was performed according to full platform capability. |
|
4. Analyze Microarchitecture Usage |
As recommended by Performance Snapshot, you used the Microarchitecture Exploration analysis to identify next optimization steps. Using this analysis type, you saw that the best way to further optimize the application was the cache blocking technique. |
|
5. Check your work |
You used the Compare Results feature to compare the performance of the application at different optimization stages. |
Perform regular regression testing by comparing analysis results before and after optimization. From the GUI, click the Compare Results button on the VTune Profiler toolbar. From command line, use the vtune command. |