Tuning Recipes
These recipes explore typical application performance problems that you can detect with
Intel® VTune™
or its predecessor, Intel® VTune™ Amplifier. Use the guidance in these recipes to optimize performance.
Profiler
Recipe
| Description
|
---|---|
Use Cache Allocation Technology (CAT) to handle cache-related latency issues (cache misses) when you split a cache between cores.
| |
Profile a memory-bound
linear_regression application using the Microarchitecture Exploration and Memory Access analyses in
Intel® VTune™
.
Profiler | |
Profile a memory-bound
matrix application using the Microarchitecture Exploration and Memory Access analyses in
Intel® VTune™
. Understand the cause for frequent DRAM accesses.
Profiler | |
Profile a core-bound
matrix application using the Microarchitecture Exploration analysis. Understand the cause for poor port utilization.
| |
Identify and measure the impact of page faults on target application performance. Use the Microarchitecture Exploration, System Overview, and Memory Access analyses in
Intel® VTune™
.
Profiler | |
Profile a front-end-bound application using the Microarchitecture Exploration analysis in
Intel® VTune™
. Use a PGO option to reduce ICache misses.
Profiler | |
Locate inefficient synchronization in your code by running the Advanced Hotspots analysis with the stack collection enabled.
| |
Locate inefficient TCP/IP synchronization in your code by running the Locks and Waits analysis in
Intel® VTune™
, with the task collection enabled.
Profiler | |
Identify OS thread migration on the NUMA architecture with the Hotspots analysis in
Intel® VTune™
.
Profiler | |
Detect and fix frequent parallel bottlenecks of OpenMP programs such as imbalance on barriers and scheduling overhead.
| |
Identify a fraction of serial execution in an application parallelized with OpenMP. Find additional opportunities for parallelization, and improve the scalability of the application.
| |
Detect and fix scheduling overhead for an Intel® TBB application.
| |
Detect and fix an overhead on memory accesses for a PMDK-based application.
|