Get Started Guide

  • 2022.0
  • 12/06/2021
  • Public Content

Discover Where Vectorization Pays Off The Most

Vectorization and Code Insights
perspective is a vectorization analysis toolset that enables you identify loops that will benefit most from vector parallelism. Profile your application using the Survey tool to locate un-vectorized and under-vectorized time-consuming functions/loops and calculate estimated performance gain achieved by vectorization.
Intel Advisor Workflow: Discover Where Vectorization Will Pay Off the Most
Use the
Vectorization and Code Insights
perspective of
Intel® Advisor
to analyze
vec_samples
application and identify hotspots for vectorization to improve performance of your code.
Follow the steps:

Prerequisites

This guide implies the following prerequisites:

Unpack and Build Your Application

To unpack the sample:
  1. Go to
    <install-dir>
    /advisor/latest/samples/<locale>/C++
    directory.
  2. Copy the
    vec_samples.zip
    (on Windows* OS) or
    vec_samples.tgz
    (on Linux* OS) file to a writable directory or share on your system.
  3. Extract the sample from the
    .zip
    or
    .tgz
    archive.
You can use your own application and apply the instructions below.
On Linux* OS
  1. Open a terminal.
  2. Change directory to the
    vec_samples/
    directory in its unzipped location.
  3. Build the sample application in release mode using the
    make baseline
    command, which contains the following compiler options:
    -O2 -g
    .
    make vec_samples
  4. Run the application to verify the build:
    ./vec_samples
On Windows* OS
  1. Open a command prompt.
  2. Change directory to the
    vec_samples\
    directory in its unzipped location.
  3. Build the sample application in release mode using the
    build.bat
    script as follows.
    build.bat baseline
    The script builds the application with the following compiler options:
    /O2 /Qstd=c99 /fp:fast /Isrc /Zi /Qopenmp
    . For details about building your own applications, see Build Target Application.
  4. Run the sample application to verify the build:
    vec_samples.exe
    You should see an output similar to the following indicating that you successfully built the application:
    ROW:47 COL: 47 Execution time is 6.020 seconds GigaFlops = 0.733887 Sum of result = 254364.540283

Establish Performance Baseline

Run
Vectorization and Code Insights
Perspective Using GUI
  1. Launch the
    Intel Advisor
    GUI using the
    advisor-gui
    command.
    To display the
    Intel Advisor
    GUI in the Visual Studio IDE, click the icon on the
    Intel Advisor
    toolbar or click
    Tools >
    Intel Advisor [version]
    > Vectorization and Threading Advisor Analysis
    .
  2. Create a project for the just-built
    vec_samples
    application. For details, see Before You Begin.
    In the
    Project Properties
    dialog box, make sure the
    Inherit settings from Survey Hotspots Analysis Type
    checkbox is selected in the
    Trip Counts and FLOP Analysis
    ,
    Dependencies Analysis
    , and
    Memory Access Patterns Analysis
    types.
  3. In the
    Performance Selector
    window, choose the
    Vectorization and Code Insights
    perspective.
  4. In the
    Analysis Workflow
    pane, set data collection accuracy level to
    Low
    , and click the button to run the perspective.
    At this accuracy level,
    Intel Advisor
    runs Survey analysis and collects performance metrics of your application to locate under- and non-vectorized hotspots.
Run
Vectorization and Code Insights
Using CLI
On Linux OS:
  1. Collect Survey data using the following command:
    advisor --collect=survey --project-dir=./vec_samples -- ./vec_samples
  2. Generate a Survey report using the following command:
    advisor --report=survey --project-dir=./vec_samples
On Windows OS:
  1. Collect Survey data using the following command:
    advisor --collect=survey --project-dir=./vec_samples -- vec_samples.exe
  2. Generate a Survey report using the following command:
    advisor --report=survey --project-dir=./vec_samples
The report will be printed to the terminal or command prompt. A copy of this report is saved into
./vec_samples/e000/hs000/advisor-survey.txt
.
Intel Advisor
enables you to use GUI to view the results collected in CLI.
Examine Results
  1. If you collected the data in CLI, open the results in GUI:
    advisor-gui ./vec_samples
    If the result does not open automatically, click
    Show Result
    .
  2. Review the
    Summary
    window, which appears after the perspective executes. This window is a dashboard containing the main information about application execution, performance hints, and indication of vectorization problems in your application. In the
    Summary
    window, notice the following:
    • Assess your application performance using the
      Elapsed Time
      metric in the
      Program Metrics
      pane. Each improvement you make to under- and unvectorized functions/loops contributes to improvement of this metric. Consider revising program elapsed time after every iteration of running the perspective.
    • In the
      Program Metrics
      pane,
      Time in scalar code
      is 100% and the
      Vectorization Gain/Efficiency
      is empty. It means there are no vectorized loops in the application.
    • In the
      Program Metrics
      pane,
      Vector Instruction Set
      is
      SSE2
      and
      SSE
      . This metric is highlighted in red. Hover over the metric value to see a warning that a higher instruction set architecture available. This warning is also reported in the
      Per Program Recommendations
      pane. Consider generating instructions for it and recompiling your application to improve performance.
    • View the top hotspots for optimization in the
      Top Time-Consuming Loops
      pane. Click the largest hotspot to view detailed metrics for it in the
      Survey Report
      .
  3. In the
    Survey Report
    , notice the following:
    • The
      Elapsed time
      value in the top left corner. This is the baseline against which subsequent improvements will be measured.
    • In the
      Type
      column, all detected loops are
      scalar
      .
    • In the
      Why No Vectorization?
      column, the compiler detected or assumed a vector dependence in most loops.
    • For one of the loops where the compiler detected or assumed a vector dependence, click the Intel Advisor control: Compiler diagnostic details 
					 control to display
      how-can-I-fix-this-issue?
      information in the
      Why No Vectorization?
      pane.
  4. Create a read-only result snapshot, which you can share or compare with other results. To do that:
  1. Click the Intel Advisor control: Snapshot 
				icon.
  2. Type
    snapshot_baseline
    in the
    Result
    name field.
  3. Select the
    Pack into archive
    checkbox to enable the
    Result path
    field.
  4. Browse to a desired location, then click the
    OK
    button to save a read-only snapshot of the current result.
  5. If the
    Survey Report
    remains grayed out after the snapshot process is complete, click anywhere on the report.
To review performance improvements, open the saved result snapshots and compare the metrics with those in the
snapshot_baseline
snapshot.

Disambiguate Pointers

Two pointers are aliased if both point to the same memory location. Storing to memory using a pointer that might be aliased may prevent some optimizations. For example, it may create a dependency between loop iterations that would make vectorization unsafe. Sometimes the compiler can generate both a vectorized and a non-vectorized version of a loop and test for aliasing at runtime to select the appropriate code path. If you know pointers do not alias, and inform the compiler, it can avoid the runtime check and generate a single vectorized code path.
In
Multiply.c
, the compiler generates runtime checks to determine if point b in function
matvec(FTYPE a[][COLWIDTH], FTYPE b[], FTYPE x[])
is aliased to either
a
or
x
. If
Multiply.c
is compiled with the
NOALIAS
macro, the restrict qualifier of argument
b
informs the compiler the pointer does not alias with any other pointer and array
b
does not overlap with
a
or
x
.
To see if the
NOALIAS
macro improves performance, do the following:
  1. Rebuild the target application with the
    NOALIAS
    macro. To do that, run one of the following:
    • On Linux OS:
      make noalias
      The command builds the application with the following compiler options:
      -O2 -g -D NOALIAS
      .
    • On Windows OS:
      build.bat noalias
      The script builds the application with the following compiler options:
      /O2 /Qstd=c99 /fp:fast /Isrc /Zi /Qopenmp /DNOALIAS
      .
  2. Rerun the
    Vectorization and Code Insights
    at
    Low
    data collection accuracy level.
  3. Check the changes in the
    Summary
    :
    • In the
      Program Metrics
      pane, a new metric
      Time in 2 Vectorized Loops
      appeared meaning that the compiler vectorized two loops. The time in the vectorized loops is 36.6% of the application execution time.
    • Examine the
      Vectorization Gain/Efficiency
      section of the pane. The loops are vectorized with 60% efficiency and have 2.39x speedup compared to their scalar version, but there is still room for more improvement. The whole application has 1.51x speedup compared to the fully scalar version.
    • The
      Elapsed time
      improves substantially.
  4. Open the
    Survey & Roofline
    tab to assess the changes in application performance. In the report, notice the following:
    • The compiler successfully vectorizes two loops: in
      matvec
       at
      Multiply.c:69
       and in
      matvec
       at
      Multiply.c:60
      .
      The loop in
      matvec
       at
      Multiply.c:60
      has a high efficiency (99%) and 3.96x estimated gain. The
      matvec
       at
      Multiply.c:69
      efficiency is lower (25%) and the bar is gray, which means that the achieved vectorization efficiency is lower than the original scalar loop efficiency. Hover over a bar in the Efficiency column to see the explanation for the estimated efficiency.
    • Click the Intel Advisor control: Expand data row 
					 icon next to the two vectorized loops. Notice both loops have a remainder loop present. Click the Intel Advisor control: Expand column set 
					 icon in the
      Trips Counts
      column set to expand it. The remainder loops are present because the trip count values for the remainder loops are not a multiple of the
      VL (Vector Length)
      value.
  5. Click the Intel Advisor control: Snapshot 
				  icon and save a
    snapshot_noalias
    result.

Generate Instructions for the Highest Instruction Set Architecture

Generating code for different instruction sets available on your compilation host processor may improve performance.
The
QxHost
(Windows OS) and
xHost
(Linux OS) options tell the compiler to generate instructions for the highest instruction set available on the compilation host processor.
To see if the
QxHost
and
xHost
options improve performance, do the following:
  1. Rebuild the application with the option as follows:
    • On Linux OS:
      make xhost
      The command builds the application with the following compiler options:
      -g -D NOALIAS -xHost
      .
    • On Windows OS:
      build.bat xhost
      The script builds the application with the following compiler options:
      /O2 /Qstd=c99 /fp:fast /Isrc /Zi /Qopenmp /DNOALIAS /QxHost
      .
  2. Re-run the
    Vectorization and Code Insights
    perspective at
    Medium
    accuracy level, which collects Survey and Characterization (Trip Counts) data.
    To run the perspective from command line, execute the following commands:
    • On Linux OS:
      advisor --collect=survey --project-dir=./vec_samples -- ./vec_samples
      advisor --collect=tripcounts --project-dir=./vec_samples -- ./vec_samples
    • On Windows OS:
      advisor --collect=survey --project-dir=./vec_samples -- vec_samples.exe
      advisor --collect=tripcounts --project-dir=./vec_samples -- vec_samples.exe
  3. Check the changes in the
    Summary
    and open the
    Survey Report
    to assess the changes in application performance. In the report, notice the following:
    • The
      Elapsed time
      probably improves.
    • The values in the
      Vector ISA
      and
      VL
      columns in the top pane (probably) change.
  4. Click the Intel Advisor control: Snapshot 
				  icon and save a
    snapshot_xhost
    result.

Next Steps

  1. Pay attention to data dependencies assumed by the compiler and check, whether these dependencies are real and prevent your functions/loops from vectorizing. To do that, run the
    Dependencies
    analysis, mark the loops containing proven dependencies, and rebuild the application adding
    /DREDUCTION
    (Windows OS) and
    -D REDUCTION
    (Linux OS) compiler options.
  2. Eliminate issues leading to significant vector code execution slowdown or block automatic vectorization by the compiler. To do that, run the
    Memory Access Patterns
    and modify memory access patterns in the problematic functions/loops.
  3. Align data to assist automatic vectorization. For details, see Data Alignment to Assist Vectorization.
  4. Reorganize code to inline loops and enable the compiler to tell which variables you want to process and determine that vectorization is safe.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.