A newer version of this document is available. Customers should click here to go to the newest version.
Tuning Recipes
These recipes explore typical application performance problems that you can detect with Intel® VTune™ Profiler or its predecessor, Intel® VTune™ Amplifier. Use the guidance in these recipes to optimize performance.
Recipe |
Description |
---|---|
Cache-Related Latency Issues in Segmented Cache Environment | Use Cache Allocation Technology (CAT) to handle cache-related latency issues (cache misses) when you split a cache between cores. |
False Sharing | Profile a memory-bound linear_regression application using the Microarchitecture Exploration and Memory Access analyses in Intel® VTune™ Profiler. |
Frequent DRAM Accesses | Profile a memory-bound matrix application using the Microarchitecture Exploration and Memory Access analyses in Intel® VTune™ Profiler. Understand the cause for frequent DRAM accesses. |
Poor Port Utilization | Profile a core-bound matrix application using the Microarchitecture Exploration analysis. Understand the cause for poor port utilization. |
Page Faults | Identify and measure the impact of page faults on target application performance. Use the Microarchitecture Exploration, System Overview, and Memory Access analyses in Intel® VTune™ Profiler. |
Instruction Cache Misses | Profile a front-end-bound application using the Microarchitecture Exploration analysis in Intel® VTune™ Profiler. Use a PGO option to reduce ICache misses. |
Inefficient Synchronization | Locate inefficient synchronization in your code by running the Advanced Hotspots analysis with the stack collection enabled. |
Inefficient TCP/IP Synchronization | Locate inefficient TCP/IP synchronization in your code by running the Locks and Waits analysis in Intel® VTune™ Profiler, with the task collection enabled. |
OS Thread Migration | Identify OS thread migration on the NUMA architecture with the Hotspots analysis in Intel® VTune™ Profiler. |
OpenMP* Imbalance and Scheduling Overhead | Detect and fix frequent parallel bottlenecks of OpenMP programs such as imbalance on barriers and scheduling overhead. |
Processor Cores Underutilization: OpenMP* Serial Time | Identify a fraction of serial execution in an application parallelized with OpenMP. Find additional opportunities for parallelization, and improve the scalability of the application. |
Scheduling Overhead in Intel® Threading Building Blocks (Intel® TBB) Apps | Detect and fix scheduling overhead for an Intel® TBB application. |
PMDK Application Overhead | Detect and fix an overhead on memory accesses for a PMDK-based application. |
- Cache-Related Latency Issues in Segmented Cache Environment
This recipe demonstrates how to use Cache Allocation Technology (CAT) to handle cache-related latency issues (cache misses) when you split a cache between cores. - False Sharing
This recipe explores profiling a memory-bound linear_regression application using the General Exploration and Memory Access analyses of the Intel® VTune™ Amplifier. - Frequent DRAM Accesses
This recipe explores profiling a memory-bound matrix application using the Microarchitecture Exploration and Memory Access analyses of the Intel® VTune™ Profiler to understand the cause of the frequent DRAM accesses. - Poor Port Utilization
This recipe explores profiling a core-bound matrix application using the Microarchitecture Exploration analysis (formerly, General Exploration) of the Intel® VTune™ Amplifier to understand the cause of the poor port utilization and Intel® Advisor to benefit from compiler vectorization. - Page Faults
This recipe helps identify and measure page faults impact on target application performance by using Intel® VTune™ Profiler's Microarchitecture Exploration, System Overview, and Memory Consumption analyses. - Instruction Cache Misses
This recipe explores profiling a front-end-bound application using the General Exploration analysis of the Intel® VTune™ Amplifier and using a PGO option to reduce ICache misses. - Inefficient Synchronization
This recipe shows how to locate inefficient synchronization in your code by running the Advanced Hotspots analysis of the Intel® VTune™ Amplifier with the stack collection enabled. - Inefficient TCP/IP Synchronization
This recipe shows how to locate inefficient TCP/IP synchronization in your code by running the Locks and Waits analysis of the Intel® VTune™ Amplifier with the task collection enabled. - OS Thread Migration
This recipe provides steps to identify OS thread migration on the NUMA architecture with the Hotspots analysis in Intel® VTune™ Profiler. - OpenMP* Imbalance and Scheduling Overhead
This recipe shows how to detect and fix frequent parallel bottlenecks of OpenMP programs such as imbalance on barriers and scheduling overhead. - Processor Cores Underutilization: OpenMP* Serial Time
This recipe shows how to identify a fraction of serial execution in an application parallelized with OpenMP, discover additional opportunities for parallelization, and improve scalability of the application. - Scheduling Overhead in Intel® Threading Building Blocks (Intel® TBB) Apps
This recipe shows how to detect and fix scheduling overhead for an Intel TBB application. - PMDK Application Overhead
This recipe shows how to detect and fix an overhead on memory accesses for a PMDK-based application.