Tutorial: Analyze Common Performance Bottlenecks using Intel VTune Profiler in a C++ Sample Application - Linux* OS

ID 762029
Date 10/15/2021

A newer version of this document is available. Customers should click here to go to the newest version.

Resolve Memory Access Issue

At this point in the Tutorial, you edit the source code and recompile the application to resolve the main memory access bottleneck.

  • Across this tutorial, the Intel® C++ Compiler Classic is used. Your results and workflow may vary depending on the compiler that you use.

  • In this stage of the tutorial, you will be instructed to set the Optimization Level of the compiler to Maximum Optimization (Favor Size) (-O1) as opposed to Maximum Optimization (Favor Speed) (-O2).

    While it makes sense to perform performance profiling with maximum optimizations that favor speed enabled, we will use this as an example to demonstrate how Intel® VTune™ Profiler can help detect issues related to unobvious behavior of compiler options. In case of the Intel® C++ Compiler Classic, the -O1 option disables automatic vectorization.

    Such issues can occur in real, larger projects, with reasons that range from something as simple as a typo, to something more complicated, such as the lack of awareness of how particular compiler options influence performance.

    For example, some compilers, such as gcc, do not attempt vectorization at -O2 level, unless instructed to do so using the -ftree-vectorize option, and will only perform automatic vectorization at the -O3 level.

Follow these steps to edit and recompile the code using the Intel® oneAPI DPC++/C++ Compiler:

  1. In the /opt/intel/oneapi/compiler/latest/env folder, run this command to set compiler environment variables:

    source env.vars
  2. Locate the matrix sample application folder on your machine. By default, it is placed in:


  3. Using a text editor of your choice, open the Makefile located in the ../matrix/linux/ folder.

  4. Change line 42 from:

    CFLAGS  = -g -O3 -fno-asm


    CFLAGS  = -g -O1
  5. Change line 43 from:

    OPTFLAGS = -xSSE3 


  6. Save and close the Makefile.

  7. Open the multiply.h header file located in ../matrix/src folder with a text editor.

  8. Change line 36 from:

    #define MULTIPLY multiply1


    #define MULTIPLY multiply2

    This changes the program to use the multiply2 function from the multiply.c source file, which implements the loop interchange technique that resolves the memory access problem.

  9. Save and close the multiply.h file.

  10. Navigate to the ../matrix/linux folder and use this command to recompile the application:

    make icc

Next step: Analyze Performance After Optimization.