CPU Metrics
CPU Metrics
This reference section describes the contents of data columns in
Survey
and
Refinement Reports
of the
.
Vectorization and Code Insights
,
CPU / Memory Roofline Insights
, and
Threading
perspectivesAccess Pattern
Description
: Summary of access types.
Collected
during
Memory Access Patterns Analysis and
found
in
Loop Information Pane (Refinement Reports)
.
Access Type
Description
: Memory access type: Read, Write, Read/Write
Address Range
Description
: Instruction address range in memory.
Interpretation
: A wide range indicates one or more of the following:
- The application uses too much memory.
- Memory usage is not optimal.
Average
Description
: Loop trip count average.
Collected
during
Trip Counts Analysis (Characterization)
, and
found
in
Loop Information Pane (Survey Report)
.
Prerequisites for collection/display
: Enabled
Collect Trip Counts
option of the
Characterization
step on Analysis Workflow tabCollect information about Loop Trip Counts
on
Trip Counts and FLOP Analysis
tab of
Project Properties Dialog Box.
B
Cache Line Utilization
Description
: Simulated cache line utilization for data transfer operations.
Collected
during
Memory Access Patterns Analysis and
found
in
Loop Information Pane (Refinement Reports)
.
Cache Misses
Description
: Number of memory load operations served by memory subsystem higher than cache. Calculated for the first instance of the loop (assuming
cold
CPU cache). Value is a result of virtual cache modeling, which might not match exact counter reported by hardware for this analysis run.
Collected
during
Memory Access Patterns Analysis and
found
in
Loop Information Pane (Refinement Reports)
.
Call Count
Description
: Number of times loop/function was invoked.
Collected
during
Trip Counts Analysis (Characterization)
, and
found
in
Loop Information Pane
(Survey Report) and
Advanced View Pane
(Survey Report)Prerequisites for collection/display
: Enabled
Collect Trip Counts
option of the
Characterization
step on Analysis Workflow tabCollect information about Loop Trip Counts
on
Trip Counts and FLOP Analysis
tab of
Project Properties Dialog Box.
Interpretation
: A high number means there is an outer loop in the selected loop call chain with high trip count values. If the loop has a low trip count value, the outer loop could be a better candidate for parallelization (threading/vectorization).
Compiler Estimated Gain
Description
: Theoretical compiler estimate of relative loop performance speedup achieved or achievable due to vectorization.
Collected
during
Survey Analysis and
found
in
Loop Information Pane
(Survey Report) and
Advanced View Pane
(Survey Report)Comparison with similar metrics
:
Gain Estimate
is
Intel Advisor
-calculated estimate of relative loop performance speedup achieved due to vectorization.
Data Types
Description
: Data types provided by binary static analysis.
Collected
during
Survey Analysis and
found
in
Loop Information Pane
(Survey Report) and
Advanced View Pane
(Survey Report)Interpretation
: Bold indicates primary data type used for vectorization.
Description
Description
: Code location classification.
Dirty Evictions
Description
: Number of evicted cache lines with a modified state introducing upstream memory traffic to a higher memory subsystem.
Collected
during
Memory Access Patterns Analysis and
found
in
Loop Information Pane (Refinement Reports)
.
Efficiency
Description
:
Intel Advisor
-calculated performance estimated gain compared to maximum achievable gain from vectorization.
Interpretation
: Normally means how effectively vectorization was applied, compared to maximum possible gain (higher is better).
Calculation/Aggregation
:
(Estimated gain/Vector length) * 100%
Interpretation
: Hover mouse over data cell for more information.
Elapsed Time
Description
: Elapsed (wall-clock) application time.
First Instance Site Footprint
Description
: For each memory access instruction for the first instance of a loop, the
Intel Advisor
:
- Tracks the minimum and maximum access addresses.
- Displays the maximum range in this metric.
Collected
during
Memory Access Patterns Analysis and
found
in
Loop Information Pane (Refinement Reports)
.
Comparison with similar metrics
: This metric is more reliable than the
Maximum Per-Instruction Address Range
metric.
Max. Per-Instruction Addr. Range
| First Instance Site Footprint
| Simulated Memory Footprint
| |
---|---|---|---|
Number of threads analyzed for loop/site
| 1
| 1
| 1
|
Number of loop instances analyzed
| All instances, but with some memory access instruction filtering
| 1
| Depends on loop call count limit:
|
Awareness of overlap between address ranges accessed in loop
| No
| Yes
| Yes
|
Suitability for code with random memory access
| No
| No
| Yes
|
Function
Description
: Function name.
Function Call Sites and Loops
Description
: Information about parent function, source file, and line where site/loop begins in
Loop Information Pane (Survey Report)
, and top-down call tree of target functions and loops in
Loop Information Pane (Survey Report)
Collected
during
Survey Analysis and
found
in
Loop Information Pane
(Survey Report) and
Advanced View Pane
(Survey Report)Interpretation
:
- Scalar function.
- Vectorized function.
- Scalar loop. Vectorization might be possible.
- Vectorized loop. Optimization might be possible.
- Scalar inner loop within vectorized outer loop. Optimization might be possible.
Gain Estimate
Description
:
Intel Advisor
-calculated estimate of relative loop performance speedup achieved due to vectorization.
Comparison with similar metrics
:
Compiler Estimated Gain
is the theoretical compiler estimate of relative loop performance speedup achieved or achievable due to vectorization.
H
Instruction Address
Description
: Instruction address in memory.
Instruction Sets
Description
: Instruction Set Architecture (ISA) usage for individual instructions.
Collected
during
Survey Analysis and
found
in
Loop Information Pane
(Survey Report) and
Advanced View Pane
(Survey Report)Iteration Duration
Description
: Average loop iteration time.
Collected
during
Trip Counts Analysis (Characterization)
, and
found
in
Loop Information Pane
(Survey Report) and
Advanced View Pane
(Survey Report)Prerequisites for collection/display
: Enabled
Collect Trip Counts
option of the
Characterization
step on Analysis Workflow tabCollect information about Loop Trip Counts
on
Trip Counts and FLOP Analysis
tab of
Project Properties Dialog Box.
J
K
Loop Instance Total Time
Description
: Average loop instance total time.
Collected
during
Trip Counts Analysis (Characterization)
, and
found
in
Loop Information Pane
(Survey Report)Prerequisites for collection/display
: Enabled
Collect Trip Counts
option of the
Characterization
step on Analysis Workflow tabCollect information about Loop Trip Counts
on
Trip Counts and FLOP Analysis
tab of
Project Properties Dialog Box.
Loop-Carried Dependencies
Description
: Dependencies summary across iterations
Possible values
:
- RAW(Read after Write) - Flow dependency
- WAR(Write after Read) - Anti dependency
- WAW(Write after Write) - Output dependency
Max
Description
: Loop trip count maximum.
Collected
during
Trip Counts Analysis (Characterization)
, and
found
in
Loop Information Pane
(Survey Report)Prerequisites for collection/display
: Enabled
Collect Trip Counts
option of the
Characterization
step on Analysis Workflow tabCollect information about Loop Trip Counts
on
Trip Counts and FLOP Analysis
tab of
Project Properties Dialog Box.
Max Site Footprint
Description
: Maximum distance (among all instances of the loop) between the minimum and maximum memory address values.
Maximum Per-Instruction Address Range
Description
: For most memory access instructions for all instances of a loop, the
Intel Advisor
:
- Tracks the minimum and maximum access addresses.
- Displays the maximum range in this metric.
The value may be imprecise because the
Intel Advisor
filters some memory access instructions while analyzing all instances of a loop. Unreliable values are displayed in gray.
Collected
during
Memory Access Patterns Analysis and
found
in
Loop Information Pane (Refinement Reports)
and Memory Access Patterns Report.
Comparison with similar metrics
: This metric is less reliable than the
First Instance Site Footprint
metric.
Max. Per-Instruction Addr. Range
| First Instance Site Footprint
| Simulated Memory Footprint
| |
---|---|---|---|
Number of threads analyzed for loop/site
| 1
| 1
| 1
|
Number of loop instances analyzed
| All instances, but with some memory access instruction filtering
| 1
| Depends on loop call count limit:
|
Awareness of overlap between address ranges accessed in loop
| No
| Yes
| Yes
|
Suitability for code with random memory access
| No
| No
| Yes
|
Memory Access Footprint
Description
: Maximum distance (among all instances of the loop) between minimum and maximum memory address values, accessed by the instructions, generated from the current source line.
Memory Loads
Description
: Number of memory load operations in first instance of the loop.
Collected
during
Memory Access Patterns Analysis and
found
in
Loop Information Pane (Refinement Reports)
|.
Memory Stores
Description
: Number of memory store operations in first instance of the loop.
Collected
during
Memory Access Patterns Analysis] and
found
in
Loop Information Pane (Refinement Reports)
.
Memory, GB
Description
: Number of data transfers, in GB, between the CPU and memory subsystem.
This is a core metric that is the basis of the arithmetic intensity (AI) calculation.
Min
Description
: Loop trip count minimum.
Collected
during
Trip Counts Analysis (Characterization)
, and
found
in
Loop Information Pane
(Survey Report)Prerequisites for collection/display
: Enabled
Collect Trip Counts
option of the
Characterization
step on Analysis Workflow tabCollect information about Loop Trip Counts
on
Trip Counts and FLOP Analysis
tab of
Project Properties Dialog Box.
Module/Modules
Description
: Executable or library name.
Collected
during
Survey Analysis,
Dependencies Analysis, and
Memory Access Patterns Analysis; and
found
in
Loop Information Pane (Survey Report)
,
Advanced View Pane (Survey Report)
, Dependencies Report, and Memory Access Patterns Report.
Multi-Pumping Factor
Description
: The number of times the compiler applied a pumping optimization to extend vector length.
Collected
during
Survey Analysis and
found
in
Loop Information Pane
(Survey Report) and
Advanced View Pane
(Survey Report)Nested Function
Description
: Name of the function (invoked from the site) where the stride diagnostic was detected.
Optimization Details
Description
: Compiler optimization details.
Collected
during
Survey Analysis and
found
in
Loop Information Pane
(Survey Report) and
Advanced View Pane
(Survey Report)Performance Issues
Description
: Performance issues found.
Collected
during
Survey Analysis, and
Memory Access Patterns Analysis, and
found
in
Loop Information Pane (Survey Report)
and
Memory Access Patterns Analysis
.
Interpretation
: Click to display confidence level about issue root cause and recommended fixes.
Problem Severity
Description
: Seriousness of a detected problem.
Possible values
:
- Error.
- Warning.
- Informational.
Q
RFO Cache Misses
Description
: Number of cache lines loaded to cache due to a modification request (Request for Ownership).
Collected
during
Memory Access Patterns Analysis and
found
in
Loop Information Pane (Refinement Reports)
.
Self AI
Description
: Ratio of
Self GFLOPS
to self L1 transferred bytes.
Collected
during
Trip Counts Analysis (Characterization)
, and
found
in
Loop Information Pane
(Survey Report)Prerequisites for collection/display
:
- Enabledor enabledCollect FLOPoption of theCharacterizationstep on Analysis Workflow tabCollect information about FLOP, L1 memory traffic, and AVX-512 mask usageonTrip Counts and FLOP Analysistab of Project Properties Dialog Box.
- SelectedShow Floating-Point Operation Columnsfor column setting.
Instruction types counted for FLOP calculation
:
- FMA, ADD, SUB, DIV, DP, MUL, ATAN, FPREM, TAN, SIN, COS, SQRT, SUB, RCP, RSQRT, EXP, VSCALE, MAX, MIN, ABS, IMUL, IDIV, FIDIVR, CMP, VREDUCE, VRND
Self Elapsed Time
Description
:
Self Time
-based wall time from beginning to end of loop/function execution, excluding time for callees.
Collected
during
Survey Analysis and
found
in
Loop Information Pane
(Survey Report) and
Advanced View Pane
(Survey Report)Comparison with similar metrics
:
Total Elapsed Time
is
Total Time
-based wall time from beginning to end of loop/function execution, including time for callees.
Interpretation
: Same as
Self Time
for single-threaded applications
Self GFLOP
Description
: Giga floating-point operations, excluding GFLOP for callees.
Collected
during
Trip Counts Analysis (Characterization)
, and
found
in
Loop Information Pane
(Survey Report)Prerequisites for collection/display
:
- Enabledor enabledCollect FLOPoption of theCharacterizationstep on Analysis Workflow tabCollect information about FLOP, L1 memory traffic, and AVX-512 mask usageonTrip Counts and FLOP Analysistab of Project Properties Dialog Box.
- SelectedShow Floating-Point Operation Columnsfor column setting.
Instruction types counted for FLOP calculation
:
- FMA, ADD, SUB, DIV, DP, MUL, ATAN, FPREM, TAN, SIN, COS, SQRT, SUB, RCP, RSQRT, EXP, VSCALE, MAX, MIN, ABS, IMUL, IDIV, FIDIVR, CMP, VREDUCE, VRND
Self GFLOPS
Collected
during
Trip Counts Analysis (Characterization)
, and
found
in
Loop Information Pane
(Survey Report)Prerequisites for collection/display
:
- Enabledor enabledCollect FLOPoption of theCharacterizationstep on Analysis Workflow tabCollect information about FLOP, L1 memory traffic, and AVX-512 mask usageonTrip Counts and FLOP Analysistab of Project Properties Dialog Box.
- SelectedShow Floating-Point Operation Columnsfor column setting.
Instruction types counted for FLOP calculation
:
- FMA, ADD, SUB, DIV, DP, MUL, ATAN, FPREM, TAN, SIN, COS, SQRT, SUB, RCP, RSQRT, EXP, VSCALE, MAX, MIN, ABS, IMUL, IDIV, FIDIVR, CMP, VREDUCE, VRND
Self Giga OP
Description
: Giga floating-point operations plus giga integer operations, excluding giga floating-point and integer operations for callees.
Collected
during
Trip Counts Analysis (Characterization)
, and
found
in
Loop Information Pane
(Survey Report)Prerequisites for collection/display
:
- Enabledor enabledCollect FLOPoption of theCharacterizationstep on Analysis Workflow tabCollect information about FLOP, L1 memory traffic, and AVX-512 mask usageonTrip Counts and FLOP Analysistab of Project Properties Dialog Box.
- SelectedShow Sum of Integer and Floating-Point Operation Columnsfor column setting.
Instruction types counted for FLOP calculation
:
- FMA, ADD, SUB, DIV, DP, MUL, ATAN, FPREM, TAN, SIN, COS, SQRT, SUB, RCP, RSQRT, EXP, VSCALE, MAX, MIN, ABS, IMUL, IDIV, FIDIVR, CMP, VREDUCE, VRND
Instruction types counted for INTOP calculation (default)
:
- ADD, ADC, SUB, MUL, IMUL, DIV, IDIV, INC/DEC, shifts, rotates
Self Giga OPS
Collected
during
Trip Counts Analysis (Characterization)
, and
found
in
Loop Information Pane
(Survey Report)Prerequisites for collection/display
:
- Enabledor enabledCollect FLOPoption of theCharacterizationstep on Analysis Workflow tabCollect information about FLOP, L1 memory traffic, and AVX-512 mask usageonTrip Counts and FLOP Analysistab of Project Properties Dialog Box.
- SelectedShow Sum of Integer and Floating-Point Operation Columnsfor column setting.
Instruction types counted for FLOP calculation
:
- FMA, ADD, SUB, DIV, DP, MUL, ATAN, FPREM, TAN, SIN, COS, SQRT, SUB, RCP, RSQRT, EXP, VSCALE, MAX, MIN, ABS, IMUL, IDIV, FIDIVR, CMP, VREDUCE, VRND
Instruction types counted for INTOP calculation (default)
:
- ADD, ADC, SUB, MUL, IMUL, DIV, IDIV, INC/DEC, shifts, rotates
Self GINTOP
Description
: Giga integer operations, excluding giga integer operations for callees.
Collected
during
Trip Counts Analysis (Characterization)
, and
found
in
Loop Information Pane
(Survey Report)Prerequisites for collection/display
:
- Enabledor enabledCollect FLOPoption of theCharacterizationstep on Analysis Workflow tabCollect information about FLOP, L1 memory traffic, and AVX-512 mask usageonTrip Counts and FLOP Analysistab of Project Properties Dialog Box.
- SelectedShow Integer Operation Columnsfor column setting.
Instruction types counted for INTOP calculation (default)
:
- ADD, ADC, SUB, MUL, IMUL, DIV, IDIV, INC/DEC, shifts, rotates
Self GINTOPS
Collected
during
Trip Counts Analysis (Characterization)
, and
found
in
Loop Information Pane
(Survey Report)Prerequisites for collection/display
:
- Enabledor enabledCollect FLOPoption of theCharacterizationstep on Analysis Workflow tabCollect information about FLOP, L1 memory traffic, and AVX-512 mask usageonTrip Counts and FLOP Analysistab of Project Properties Dialog Box.
- SelectedShow Integer Operation Columnsfor column setting.
Instruction types counted for INTOP calculation (default)
:
- ADD, ADC, SUB, MUL, IMUL, DIV, IDIV, INC/DEC, shifts, rotates
Self INT AI
Description
: Ratio of
Self GINTOPS
to self L1 transferred bytes.
Collected
during
Trip Counts Analysis (Characterization)
, and
found
in
Loop Information Pane
(Survey Report)Prerequisites for collection/display
:
- Enabledor enabledCollect FLOPoption of theCharacterizationstep on Analysis Workflow tabCollect information about FLOP, L1 memory traffic, and AVX-512 mask usageonTrip Counts and FLOP Analysistab of Project Properties Dialog Box.
- SelectedShow Integer Operation Columnsfor column setting.
Instruction types counted for INTOP calculation (default)
:
- ADD, ADC, SUB, MUL, IMUL, DIV, IDIV, INC/DEC, shifts, rotates
Self Memory (GB)
Description
: Data transfers between CPU and memory subsystem (total traffic, including caches and DRAM) in gigabytes, excluding transfers for callees.
Collected
during
Trip Counts Analysis (Characterization)
, and
found
in
Loop Information Pane
(Survey Report)Prerequisites for collection/display
: Enabled
Collect FLOP
option of the
Characterization
step on Analysis Workflow tabCollect information about FLOP, L1 memory traffic, and AVX-512 mask usage
on
Trip Counts and FLOP Analysis
tab of
Project Properties Dialog Box.
Self Memory (GB/s)
Description
: Data transfers between CPU and memory subsystem (total traffic, including caches and DRAM) in gigabytes per second, excluding transfers for callees.
Collected
during
Trip Counts Analysis (Characterization)
, and
found
in
Loop Information Pane
(Survey Report)Prerequisites for collection/display
: Enabled
Collect FLOP
option of the
Characterization
step on Analysis Workflow tabCollect information about FLOP, L1 memory traffic, and AVX-512 mask usage
on
Trip Counts and FLOP Analysis
tab of
Project Properties Dialog Box.
Calculation/Aggregation
:
Self GBs /
Self Elapsed Time
Self Overall AI
Collected
during
Trip Counts Analysis (Characterization)
, and
found
in
Loop Information Pane
(Survey Report)Prerequisites for collection/display
:
- Enabledor enabledCollect FLOPoption of theCharacterizationstep on Analysis Workflow tabCollect information about FLOP, L1 memory traffic, and AVX-512 mask usageonTrip Counts and FLOP Analysistab of Project Properties Dialog Box.
- SelectedShow Sum of Integer and Floating-Point Operation Columnsfor column setting.
Instruction types counted for FLOP calculation
:
- FMA, ADD, SUB, DIV, DP, MUL, ATAN, FPREM, TAN, SIN, COS, SQRT, SUB, RCP, RSQRT, EXP, VSCALE, MAX, MIN, ABS, IMUL, IDIV, FIDIVR, CMP, VREDUCE, VRND
Instruction types counted for INTOP calculation (default)
:
- ADD, ADC, SUB, MUL, IMUL, DIV, IDIV, INC/DEC, shifts, rotates
Self Time
Description
: Time actively executing a function/loop, excluding time for callees.
Collected
during
Survey Analysis and
found
in
Loop Information Pane
(Survey Report) and
Advanced View Pane
(Survey Report)Comparison with similar metrics
:
Total Time
is time actively executing a function/loop, including time for callees.
Simulated Memory Footprint
Description
: The summarized and overlap-aware memory footprint across all instances of a loop.
Collected
during
Memory Access Patterns Analysis and found in
Loop Information Pane (Refinement Reports)
.
Prerequisites for collection/display
:
In the GUI
Project Properties Dialog Box:
- EnableEnable CPU cache simulation.
- In theCache simulation modedrop-down list, chooseModel cache misses and loop footprint.
- Tweak otherEnable CPU cache simulationparameters as necessary.
CLI example:
advisor -collect map -mark-up-list=1,2,7,17,26 -enable-cache-simulation -cachesim-mode=footprint -project-dir C:\my_advisor_project -- my_application.exe
Comparison with similar metrics
:
Max. Per-Instruction Addr. Range
| First Instance Site Footprint
| Simulated Memory Footprint
| |
---|---|---|---|
Number of threads analyzed for loop/site
| 1
| 1
| 1
|
Number of loop instances analyzed
| All instances, but with some memory access instruction filtering
| 1
| Depends on loop call count limit:
|
Awareness of overlap between address ranges accessed in loop
| No
| Yes
| Yes
|
Suitability for code with random memory access
| No
| No
| Yes
|
Calculation/Aggregation
:
Number of unique cache lines accessed during cache simulation * Cache line size
.
For performance reasons, not all accesses and cache lines are simulated. Instead the
Intel Advisor
tracks a subset and then scales up to the whole cache size to determine the final footprint value.
Site Location
Description
: Information about parent function, source file, and line where site/loop begins.
Collected
during
Dependencies Analysis and
Memory Access Patterns Analysis, and
found
in
Loop Information Pane (Refinement Reports)
.
Site Name
Description
: Site name if using source annotations; sequence ID if marking loops for deeper analysis in
Survey Report
.
Collected
during
Dependencies Analysis and
Memory Access Patterns Analysis, and
found
in
Loop Information Pane (Refinement Reports)
, Dependencies Report, and Memory Access Patterns Report.
Source/Source Location/Sources
Description
: Source file name(s) and line number(s).
Collected
duringSurvey Analysis,
Dependencies Analysis and
Memory Access Patterns Analysis; and
found
in
Loop Information Pane (Survey Report)
,
Advanced View Pane (Survey Report)
, Dependencies Report, and Memory Access Report.
State
Description
: State of most severe problem in problem set.
Possible values
:
- Regression] - Not investigated. Set by theIntel Advisor.Issue requires more investigation because it was marked asFixedin baseline result but still appears.
- New] - Not investigated. Set by theIntel Advisoror user.Issue did not appear in the baseline result, or there is no older result from which theIntel Advisorcan propagate state information.
- Not Fixed] - Not investigated. Set by user.Issue appeared in the baseline result and still requires investigation.
- Confirmed] - Investigated. Set by user.Issue requires fixing but has not yet been fixed.
- Fixed] - Investigated. Set by user.Issue requires fixing and has been fixed.
- Not a problem] - Investigated. Set by user.Issue does not require fixing.
- Deferred] - Investigated. Set by user.You are postponing further investigation on an issue that may or may not require fixing.
Stride
Description
: Distance, in elements, between memory accesses in two consequent iterations.
Collected
during
Memory Access Patterns Analysis and found in Memory Access Patterns Report.
Strides Distribution
Description
: Stride ratio in following format: Unit%/Constant%/Variable%
Collected
during
Memory Access Patterns Analysis and
found
in
Loop Information Pane (Refinement Reports)
.
Total Elapsed Time
Description
:
Total Time
-based wall time from beginning to end of loop/function execution, including time for callees
Collected
during
Survey Analysis and
found
in
Loop Information Pane
(Survey Report) and
Advanced View Pane
(Survey Report)Comparison with similar metrics
:
Self Elapsed Time
is
Self Time
-based wall time from beginning to end of loop/function execution, excluding time for callees.
Interpretation
: Same as
Total Time
for single-threaded applications.
Total Time
Description
: Time actively executing a function/loop, including time for callees.
Collected
during
Survey Analysis and
found
in
Loop Information Pane
(Survey Report) and
Advanced View Pane
(Survey Report)Comparison with similar metrics
:
Self Time
is time actiely executing a function/loop, not including time for callees.
Traits
Description
: Scalar and vectorization characteristics that may impact performance.
Collected
during
Survey Analysis and
found
in
Loop Information Pane
(Survey Report) and
Advanced View Pane
(Survey Report)Possible values
:
Trait | Detected ASM Instructions |
---|---|
Divisions
| *DIV*
|
Square Roots
| *SQRT*
|
Type Conversions
| *CVT*
|
NT-stores
| *MOVNT*
|
Gathers
| *GATHER*
|
Scatters
| *SCATTER*
|
Shuffles
| *SHUF*
|
Permutes
| *PERM*
|
Blends
| *BLEND*
|
Packs
| *PACK*
|
Unpacks
| *UNPCK*
|
Inserts
| *INSERT*
|
Extracts
| *EXTRACT*
|
Masked Stores
| *MASKMOV*
|
Shifts
| *PROR*, *PROL*, *PSLL*, *PSRA*, *PSRL*
|
FMA
| *FMADD*, *FMSUB*, *FNMADD*, *FNMSUB*
|
Mask Manipulations
| *KADD*, *KTEST*, *KAND*, *KOR*, *KXOR*, *KXNOR*, *KNOT*, *KUNPCK*, *KMOV*, *KSHIFT*
|
Conflict Detections
| *VPCONFLICT*
|
Exponent extractions
| *VGETEXP*
|
Mantissa extractions
| *VGETMANT*
|
Expands
| *EXPAND*
|
Compresses
| *COMPRESS*
|
VNNI
| *VNNI*
|
Transformations
Description
: Loop transformations applied by compiler.
Collected
during
Survey Analysis and
found
in
Loop Information Pane
(Survey Report) and
Advanced View Pane
(Survey Report)Type
Collected
during
Survey Analysis,
Dependencies Analysis, and
Memory Access Patterns Analysis; and
found
in
Loop Information Pane (Survey Report)
,
Advanced View Pane (Survey Report)
, Dependencies Report, and Memory Access Patterns Report.
Possible Survey Report values
:
- Peeled/Remainder- For loops that have child loops. Appears only when scalar peeled loop and/or remainder loop executed.
- Threaded- For loops that have child loops. Appears when some parallel framework (OpenMP* or automatically by Intel compiler) is used in the loop.
- Vectorized (<loop part(s)>)- For vectorized parent and child loops. Appears when a parent loop hasanyof the following parts executed: peeled, body, remainder. Also appears for child loops that haveoneof the following parts executed: peeled, body, remainder.
- Peeled- For small, (usually) compiler-generated loops created to align the memory accesses inside the loop body and maximize its efficiency.
- Body- For vectorized loops (compiler-generated from a source loop). Most loop iterations should execute in body, as body normally processes more data than peeled or remainder loops. Vector length in the body is usually larger than in peeled and/or remainder loops, which means body is the most efficient place for performance.
- Remainder- For (usually) compiler-generated loops created to clean up any remaining iterations that do not fit within the scope of the loop body.
- [Not Executed]- Mark that appears next to any other loop metric when a loop was not executed.
- Scalar- Appears when non-vectorized loops executed.
- Completely Unrolled- Appears when the loop body was copied several times (equal to trip counts value) by the compiler.
- Inside vectorized- Appears when the inner loop was vectorized in addition to the outer loop.
- Inlined Function- Appears when the function body was inlined into the loop/function body.
- Vector Function- Appears when a SIMD-enabled version of the function executed. (See Intel compiler documentation for details).
- Function- Appears when a scalar version of the function executed.
Possible Memory Access Patterns Report values
:
- Uniform stride 0- Instruction accesses the same memory from iteration to iteration.Represents the ideal situation and does not require any improvements.
- Unit stride (stride 1)- Instruction accesses memory that consistently changes by one element from iteration to iteration.Represents the ideal situation and does not require any improvements.
- Constant stride (stride N)- Instruction accesses memory that consistently changes by N elements (N>1) from iteration to iteration.Code uses more memory than is ideal and requires more cache lines. Consider studying recommendations on AOS/SOA optimization.
- Irregular stride- Instruction accesses memory addresses that change by an unpredictable number of elements from iteration to iteration.Might limit vectorization or even make vectorization impossible.
- Gather (irregular) stride- Detected for v(p)gather* instructions on AVX2 Instruction Set Architecture (ISA).The compiler vectorized code with an irregular memory access pattern. Consider improving the code to use a more constant memory access pattern.
Possible Dependencies Report values
- See
Problem and Message Types.
Unroll Factor
Description
: Loop unroll factor applied by the compiler.
Collected
during
Survey Analysis and
found
in
Loop Information Pane
(Survey Report) and
Advanced View Pane
(Survey Report)Variable References
Description
: Name of the variable for which the dependency or memory access stride is detected.
Collected
during
Dependencies Analysis and
Memory Access Patterns Analysis, and
found
in Dependencies Report and Memory Access Patterns Report.
Vector ISA
Description
: The highest vector Instruction Set Architecture used for individual instructions.
Collected
during
Survey Analysis and
found
in
Loop Information Pane
(Survey Report) and
Advanced View Pane
(Survey Report)Comparison with similar metrics
: An ISA higher than the ISA of your current hardware appears when you add corresponding codepaths with
x
,
Qx
/
ax
,
Qax
compiler options. To see the ISA of non-executed codepaths, enable the
Analyze non-executed codepaths
option in
Project Properties
.
Vector Widths
Description
: Vector register width in bits.
Collected
during
Survey Analysis and
found
in
Loop Information Pane
(Survey Report) and
Advanced View Pane
(Survey Report)Possible values
: Combination of values, including 32, 64, 128, 256, 512, delimited by a slash or semi-colon (/ or ;).
Vectorization Details
Description
: Compiler notes on vectorization.
Collected
during
Survey Analysis and
found
in
Loop Information Pane
(Survey Report) and
Advanced View Pane
(Survey Report)VL (Vector Length)
Description
: The number of elements processed in a single iteration of vector loops, or the number of elements processed in individual vector instructions.
Collected
during
Survey Analysis and
found
in
Loop Information Pane
(Survey Report) and
Advanced View Pane
(Survey Report)Calculation/Aggregation
: Estimated by binary static analysis or the Intel compiler.
Why No Vectorization?
Description
: The reason the compiler did not vectorize the loop.
Collected
during
Survey Analysis and
found
in
Loop Information Pane
(Survey Report) and
Advanced View Pane
(Survey Report)Interpretation
: Click to display the issue root cause and recommended fixes.