Cookbook

  • 2021
  • 11/09/2021
  • Public Content
Contents

Tuning Recipes

These recipes explore typical application performance problems that you can detect with
Intel® VTune™
Profiler
or its predecessor, Intel® VTune™ Amplifier. Use the guidance in these recipes to optimize performance.
Recipe
Description
Use Cache Allocation Technology (CAT) to handle cache-related latency issues (cache misses) when you split a cache between cores.
Profile a memory-bound
linear_regression
application using the Microarchitecture Exploration and Memory Access analyses in
Intel® VTune™
Profiler
.
Profile a memory-bound
matrix
application using the Microarchitecture Exploration and Memory Access analyses in
Intel® VTune™
Profiler
. Understand the cause for frequent DRAM accesses.
Profile a core-bound
matrix
application using the Microarchitecture Exploration analysis. Understand the cause for poor port utilization.
Identify and measure the impact of page faults on target application performance. Use the Microarchitecture Exploration, System Overview, and Memory Access analyses in
Intel® VTune™
Profiler
.
Profile a front-end-bound application using the Microarchitecture Exploration analysis in
Intel® VTune™
Profiler
. Use a PGO option to reduce ICache misses.
Locate inefficient synchronization in your code by running the Advanced Hotspots analysis with the stack collection enabled.
Locate inefficient TCP/IP synchronization in your code by running the Locks and Waits analysis in
Intel® VTune™
Profiler
, with the task collection enabled.
Use the Disk IO analysis for a sample IO bound application. Change the affinity for a PCIe device to increase read access bandwidth and optimize your application.
Identify OS thread migration on the NUMA architecture with the Hotspots analysis in
Intel® VTune™
Profiler
.
Detect and fix frequent parallel bottlenecks of OpenMP programs such as imbalance on barriers and scheduling overhead.
Identify a fraction of serial execution in an application parallelized with OpenMP. Find additional opportunities for parallelization, and improve the scalability of the application.
Detect and fix scheduling overhead for an Intel® TBB application.
Detect and fix an overhead on memory accesses for a PMDK-based application.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.