Tutorial: Analyze Common Performance Bottlenecks using Intel VTune Profiler in a C++ Sample Application - Windows* OS

ID 762031
Date 3/31/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Resolve Memory Access Issue

At this point in the Tutorial, you edit the source code and recompile the application to resolve the main memory access bottleneck.

NOTE:
  • Across this tutorial, the Intel® oneAPI DPC++/C++ Compiler is used, installed as part of the Intel® oneAPI Base Toolkit. Your results and workflow may vary depending on the compiler that you use.

  • In this stage of the tutorial, you will be instructed to set the Optimization Level of the compiler to Maximum Optimization (Favor Size) (/O1) as opposed to Maximum Optimization (Favor Speed) (/O2).

    While it makes sense to perform performance profiling with maximum optimizations that favor speed enabled, we will use this as an example to demonstrate how Intel® VTune™ Profiler can help detect issues related to unobvious behavior of compiler options. In case of the Intel® oneAPI DPC/C++ Compiler, the /O1 option disables automatic vectorization.

    Such issues can occur in real, larger projects, with reasons that range from something as simple as a typo, to something more complicated, such as the lack of awareness of how particular compiler options influence performance.

    For example, some compilers, such as gcc, do not attempt vectorization at -O2 level, unless instructed to do so using the -ftree-vectorize option, and will only perform automatic vectorization at the -O3 level.

Follow these steps to edit and recompile the code in Microsoft Visual Studio* using the Intel® oneAPI DPC++/C++ Compiler:

  1. Locate the matrix sample application folder on your machine. By default, it is placed in:

    [Documents]\VTune\samples\matrix

  2. Open the matrix.sln Visual Studio solution located in the ..\matrix\vc15 folder.

  3. Make sure you are building the application with the Release configuration and x64 platform enabled.

  4. In the Solution Explorer, right-click the matrix project and select Properties.

    The Properties window opens.

  5. In Configuration Properties -> General, change the Platform Toolset to Intel C++ Compiler <version>.

  6. In the C/C++ -> General menu, make sure that Debug Information Format is set to Program Database (/Zi).

  7. In the C/C++ -> Optimization menu, make sure the Optimization option is set to Maximum Optimizations (Favor Size) (/O1).

  8. In the C/C++ > Diagnostics [Intel C++] menu, set the Optimization Diagnostic Level to Level 2 (/Qopt-report:2).

  9. In the multiply.h header file, on line 36, change the following line:

    #define MULTIPLY multiply1

    To:

    #define MULTIPLY multiply2

    This changes the program to use the multiply2 function from the multiply.c source file, which implements the loop interchange technique that resolves the memory access problem.

  10. Build the project.

Next step: Analyze Performance After Optimization.