## Get Started Guide

• 2022.1
• 04/11/2022
• Public Content
Contents

# Discover Where Vectorization Pays Off The Most

With the
Vectorization and Code Insights
perspective, you can identify loops and unctions in your application that can benefit most from vector parallelism, locate un-vectorized and under-vectorized time-consuming functions/loops and calculate estimated performance gain achieved by vectorization.
vec_samples
application and identify vectorization hotspots to improve performance of your code. You can also use your own application to follow the instructions below.

## Prerequisites

1. Install the
as a standalone or as part of
Intel® oneAPI Base Toolkit
. For installation instructions, see Install
in the user guide.
2. Install the
Intel® C++ Compiler Classic
as a standalone or as part of
Intel® oneAPI HPC Toolkit
. For installation instructions, see Intel® oneAPI Toolkits Installation Guide.
3. Set up environment variables for the
and
Intel® C++ Compiler Classic
. For example, run the
setvars
script in the installation directory.
This document assumes you installed the tools to a default location. If you installed the tools to a different location, make sure to replace the default path in the commands below.
Do not close the terminal or command prompt after setting the environment variables. Otherwise, the environment resets.

## Unpack and Build Your Application

On Linux* OS
From the terminal where you set the environment variables:
1. Go to
directory.
2. Copy the
vec_samples.tgz
file to a writable directory or share on your system.
3. Extract the sample from the
.tgz
archive.
4. Change directory to the
vec_samples/
directory in its unzipped location.
5. Build the sample application in release mode:
make baseline
The command build the application with the
-O2 -g
compiler options. For details about building your own applications, see Build Target Application.
6. Run the application to verify the build:
./vec_samples
You should see an output similar to the following indicating that you successfully built the application:
ROW:47 COL: 47
Execution time is 6.020 seconds
GigaFlops = 0.733887
Sum of result = 254364.540283
On Windows* OS
From the command prompt where you set the environment variables:
1. Go to
directory.
2. Copy the
vec_samples.zip
file to a writable directory or share on your system.
3. Extract the sample from the
.zip
archive.
4. Change directory to the
vec_samples\
directory in its unzipped location.
5. Build the sample application in release mode as follows:
build.bat baseline
The script builds the application with the
/O2 /Qstd=c99 /fp:fast /Isrc /Zi /Qopenmp
compiler options. For details about building your own applications, see Build Target Application.
6. Run the sample application to verify the build:
vec_samples.exe
You should see an output similar to the following indicating that you successfully built the application:
ROW:47 COL: 47
Execution time is 6.020 seconds
GigaFlops = 0.733887
Sum of result = 254364.540283

## Establish Performance Baseline

Run
Vectorization and Code Insights
from Graphical User Interface (GUI)
1. From the terminal or command prompt where you set the environment variables, launch the
GUI:
advisor-gui
2. Create a project for the just-built
vec_samples
application. For details, see Before You Begin.
When in the
Project Properties
dialog box, make sure the
Inherit settings from Survey Hotspots Analysis Type
checkbox is selected in the
Trip Counts and FLOP Analysis
,
Dependencies Analysis
, and
Memory Access Patterns Analysis
types.
3. In the
Perspective Selector
window, choose the
Vectorization and Code Insights
perspective.
4. In the
Analysis Workflow
pane, set data collection accuracy level to
Low
, and click the button to run the perspective.
At this accuracy level,
runs Survey analysis and collects performance metrics of your application to locate under- and non-vectorized hotspots.
Run
Vectorization and Code Insights
from Command Line Interface (CLI)
On Linux OS
From the command prompt where you set the environment variables:
1. Collect Survey data using the following command:
advisor --collect=survey --project-dir=./vec_samples -- ./vec_samples
2. Generate a Survey report using the following command:
advisor --report=survey --project-dir=./vec_samples
The report summary will be printed to the terminal or command prompt. A copy of this report is saved into
.
When the analysis execution completes, the
vec_samples
project is created automatically, which includes the
Vectorization and Code Insights
results. You can view them from
GUI.
On Windows OS
From the command prompt where you set the environment variables:
1. Collect Survey data using the following command:
advisor --collect=survey --project-dir=./vec_samples -- vec_samples.exe
2. Generate a Survey report using the following command:
advisor --report=survey --project-dir=./vec_samples
The report will be printed to the terminal or command prompt. A copy of this report is saved into
.
When the analysis execution completes, the
vec_samples
project is created automatically, which includes the
Vectorization and Code Insights
results. You can view them from
GUI.
Examine Results
If you collect data using GUI,
automatically opens the results when the collection completes.
If you collect data using CLI, open the results in GUI using the following command:
advisor-gui ./vec_samples
If the result does not open automatically, click
Show Result
.
When you open the
Vectorization and Code Insights
result in GUI,
shows the
Summary
tab first. This window is a dashboard containing the main information about application execution, performance hints, and indication of vectorization problems in your application.
In the
Summary
window, notice the following:
1. Assess your application performance using the
Elapsed Time
metric in the
Program Metrics
pane. Each improvement you make to under- and unvectorized functions/loops contributes to improvement of this metric. Consider revising program elapsed time after every iteration of running the perspective.
2. In the
Program Metrics
pane,
Time in scalar code
is 100% and the
Vectorization Gain/Efficiency
is empty. It means there are no vectorized loops in the application.
3. In the
Program Metrics
pane,
Vector Instruction Set
is
SSE2
and
SSE
. This metric is highlighted in red. Hover over the metric value to see a warning that a higher instruction set architecture available. This warning is also reported in the
Per Program Recommendations
pane. Consider generating instructions for it and recompiling your application to improve performance.
4. View the top hotspots for optimization in the
Top Time-Consuming Loops
pane. Click the largest hotspot to view detailed metrics for it in the
Survey Report
.
Switch to the
Survey & Roofline
tab, you can analyze performance for each loop/function in the application.
1. The
Elapsed time
value in the top left corner. This is the baseline against which subsequent improvements will be measured.
2. In the
Type
column, all detected loops are
scalar
.
3. In the
Why No Vectorization?
column, the compiler detected or assumed a vector dependence in most loops.
4. For one of the loops where the compiler detected or assumed a vector dependence, click the control to display
how-can-I-fix-this-issue?
information in the
Why No Vectorization?
pane.
5. Review the
Summary
window, which appears after the perspective executes. This window is a dashboard containing the main information about application execution, performance hints, and indication of vectorization problems in your application.
Create a Read-only Snapshot for the Baseline Result
Create a read-only result snapshot, which you can share or compare with other results. To do that:
1. Click the icon.
2. Type
snapshot_baseline
in the
Result
name field.
3. Select the
Pack into archive
checkbox to enable the
Result path
field.
4. Browse to a desired location, then click the
OK
button to save a read-only snapshot of the current result.
5. If the
Survey Report
remains grayed out after the snapshot process is complete, click anywhere on the report.
To review performance improvements, open the saved result snapshots and compare the metrics with those in the
snapshot_baseline
snapshot.

## Disambiguate Pointers

Two pointers are aliased if both point to the same memory location. Storing to memory using a pointer that might be aliased may prevent some optimizations. For example, it may create a dependency between loop iterations that would make vectorization unsafe. Sometimes the compiler can generate both a vectorized and a non-vectorized version of a loop and test for aliasing at runtime to select the appropriate code path. If you know pointers do not alias, and inform the compiler, it can avoid the runtime check and generate a single vectorized code path.
In
Multiply.c
, the compiler generates runtime checks to determine if point b in function
matvec(FTYPE a[][COLWIDTH], FTYPE b[], FTYPE x[])
is aliased to either
a
or
x
. If
Multiply.c
is compiled with the
NOALIAS
macro, the restrict qualifier of argument
b
informs the compiler the pointer does not alias with any other pointer and array
b
does not overlap with
a
or
x
.
To see if the
NOALIAS
macro improves performance, do the following:
On Linux OS
From the same terminal window:
1. Navigate to the
vec_samples/
directory.
2. Rebuild the target application with the
NOALIAS
macro:
make noalias
The command builds the application with the following compiler options:
-O2 -g -D NOALIAS
.
3. Rerun the
Vectorization and Code Insights
perspective from GUI or CLI with the same configuration as for the baseline result. See the sections above for instructions.
On Windows OS
From the same terminal window:
1. Navigate to the
vec_samples
directory.
2. Rebuild the target application with the
NOALIAS
macro:
build.bat noalias
The script builds the application with the following compiler options:
/O2 /Qstd=c99 /fp:fast /Isrc /Zi /Qopenmp /DNOALIAS
.
3. Rerun the
Vectorization and Code Insights
perspective from GUI or CLI with the same configuration as for the baseline result. See the sections above for instructions.
View the Results
If you collect data using GUI,
automatically opens the results when the collection completes.
If you collect data using CLI, open the results in GUI using the following command:
advisor-gui ./vec_samples
If the result does not open automatically, click
Show Result
.
Check changes in the
Summary
window:
1. In the
Program Metrics
pane, a new metric
Time in 2 Vectorized Loops
appeared meaning that the compiler vectorized two loops. The time in the vectorized loops is 36.6% of the application execution time.
2. Examine the
Vectorization Gain/Efficiency
section of the pane. The loops are vectorized with 60% efficiency and have 2.39x speedup compared to their scalar version, but there is still room for more improvement. The whole application has 1.51x speedup compared to the fully scalar version.
3. The
Elapsed time
improves substantially.
Open the
Survey & Roofline
tab to assess the changes in application performance. In the report, notice the following:
1. The compiler successfully vectorizes two loops: in
matvec
at
Multiply.c:69
and in
matvec
at
Multiply.c:60
.
The loop in
matvec
at
Multiply.c:60
has a high efficiency (99%) and 3.96x estimated gain. The
matvec
at
Multiply.c:69
efficiency is lower (25%) and the bar is gray, which means that the achieved vectorization efficiency is lower than the original scalar loop efficiency. Hover over a bar in the Efficiency column to see the explanation for the estimated efficiency.
2. Click the icon next to the two vectorized loops. Notice both loops have a remainder loop present. Click the icon in the
Trips Counts
column set to expand it. The remainder loops are present because the trip count values for the remainder loops are not a multiple of the
VL (Vector Length)
value.
Click the icon and save a
snapshot_noalias
result.

## Generate Instructions for the Highest Instruction Set Architecture

Generating code for different instruction sets available on your compilation host processor may improve performance.
The
QxHost
(Windows OS) and
xHost
(Linux OS) options tell the compiler to generate instructions for the highest instruction set available on the compilation host processor.
To see if the
QxHost
and
xHost
options improve performance, do the following:
On Linux OS
From the same terminal window, build the application:
1. Navigate to the
vec_samples/
directory.
2. Rebuild the target application as follows:
make xhost
The command builds the application with the following compiler options:
-g -D NOALIAS -xHost
.
On Windows OS
From the same command prompt window:
1. Navigate to the
vec_samples/
directory.
2. Rebuild the target application as follows:
build.bat xhost
The script builds the application with the following compiler options:
/O2 /Qstd=c99 /fp:fast /Isrc /Zi /Qopenmp /DNOALIAS /QxHost
.
Re-run the
Vectorization and Code Insights
perspective from GUI or CLI.
Run Vectorization and Code Insights from GUI
1. Open the project in GUI:
advisor-gui .\vec_samples
2. In the
Analysis Workflow
pane for the
Vectorization and Code Insights
perspective, set data collection accuracy level to
Medium
.
At this accuracy level,
collects Survey and Characterization (Trip Counts) data.
3. Run the perspective.
Run Vectorization and Code Insights from CLI
On Linux OS
From the same terminal window:
1. Collect Survey data using the following command:
advisor --collect=survey --project-dir=./vec_samples -- ./vec_samples
2. Collect Trip Counts data using the following command:
advisor --collect=tripcounts --project-dir=./vec_samples -- ./vec_samples
When the analysis execution completes, the
vec_samples
project is created automatically, which includes the
Vectorization and Code Insights
results. You can view them from
GUI.
On Windows OS
From the same command prompt window:
1. Collect Survey data using the following command:
advisor --collect=survey --project-dir=./vec_samples -- vec_samples.exe
2. Collect Trip Counts data using the following command:
advisor --collect=tripcounts --project-dir=./vec_samples -- vec_samples.exe
When the analysis execution completes, the
vec_samples
project is created automatically, which includes the
Vectorization and Code Insights
results. You can view them from
GUI.
View the Results
If you collect data using GUI,
automatically opens the results when the collection completes.
If you collect data using CLI, open the results in GUI using the following command:
advisor-gui ./vec_samples
If the result does not open automatically, click
Show Result
.
Check the changes in the
Summary
and open the
Survey Report
to assess the changes in application performance. In the report, notice the following:
• The
Elapsed time
probably improves.
• The values in the
Vector ISA
and
VL
columns in the top pane (probably) change.
Click the icon and save a
snapshot_xhost
result.

## Next Steps

1. Pay attention to data dependencies assumed by the compiler and check, whether these dependencies are real and prevent your functions/loops from vectorizing. To do that, run the
Dependencies
analysis, mark the loops containing proven dependencies, and rebuild the application adding
/DREDUCTION
(Windows OS) and
-D REDUCTION
(Linux OS) compiler options.
2. Eliminate issues leading to significant vector code execution slowdown or block automatic vectorization by the compiler. To do that, run the
Memory Access Patterns
and modify memory access patterns in the problematic functions/loops.
3. Align data to assist automatic vectorization. For details, see Data Alignment to Assist Vectorization.
4. Reorganize code to inline loops and enable the compiler to tell which variables you want to process and determine that vectorization is safe.

#### Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.