Examine Bottlenecks on CPU Roofline Chart
- By dot size and color, identify loops that take most of total program time and/or located very low in the chart. For example:
You can switch between coloring the dots by execution time and coloring the dots by type (scalar or vectorized) in the roof view menu on the right.
- Small, green dots take up relatively little time, so are likely not worth optimizing.
- Large, red dots take up the most time, so the best candidates for optimization are the large, red dots with a large amount of space between them and the topmost roofs.
- Depending on the dots position, identify what the loops are bounded by.Intel® Advisormarks the roofline zones on the chart to help you identify what roofs bound the loop:
- Loop is bounded by memory roofs.
- Loop is bounded by compute roofs.
- Loop is bounded by both memory and compute roofs.
- In theRecommendationstab, scroll down to theRoofline Guidancesection that provides you hints on next optimization steps for a selected loop/function.
Below a memory roof (DRAM Bandwidth, L1 Bandwidth, so on)
The loop/function uses memory inefficiently.
Run a Memory Access Patterns analysis for this loop.
Below Vector Add Peak
The loop/function under-utilizes available instruction sets.
Traitscolumn in the Survey report to see if FMAs are used.
Just above Scalar Add Peak
The loop/function is undervectorized.
Check vectorization efficiency and performance issues in the Survey. Follow the recommendations to improve it if it's low.
Below Scalar Add Peak
The loop/function is scalar.
Analyze Specific Loops
- Refer toLoop Informationpane to examine total time, self time, instruction sets used, and instruction mix for the selected loop. Intel Advisor provides:
Intel Advisor automatically determines the data type used in operations. View the classes of instructions grouped by categories in instruction mix:CategoryInstruction TypesCompute (FLOP and INTOP)ADD, MUL, SUB, DIV, SAD, MIN, AVG, MAX, ABS, SIN, SQRT, FMA, RCCP, SCALE, FCOM, V4FMA, V4VNNIMemory
- Static instruction mix data that is based on static assembly code analysis within a call stack. Use static instruction mix to examine instruction sets in the inner-most functions/loops.
- Dynamic instruction mix that is based on dynamic assembly code analysis. This metric represents the total count of instructions executed by your function/loop. Use dynamic instruction mix to examine instruction sets in the outer loops and in complex loop-nests.
MixedCompute instructions with memory operandsOtherMOVE, CONTROL FLOW, SYNC, OTHERIntel Advisorcounts FMA and VNNI instructions as more than 1 operation depending on the size of the data type and/or the type of vector registers used.
- scalar and vector MOV instructions
- GATHER/SCATTER instructions
- VBMI2 compress/expand instructions
- Refer toRooflinepane for more details about a specific roof that bounds the loop:
- View roofs with number of threads, data types, and instructions mix used in the loop
- Identify what exactly bounds the selected loop - memory, compute, or both memory and compute
- Determine the exact roof that bounds the loop and estimates a potential speedup for the loop in the callout if you optimize it for this roof
- Refer toStatistics for operationspane to view the count of operations collected during Characterization analysis. Depending on the operations you need, use a drop-down list to choose FLOP, INTOP, FLOP+INTOP or All Operations. Switch between Self and Total data using the toggle in the top right-hand corner of the pane.Intel Advisorcalculatesfloating-point operations (FLOP)as a sum of the following classes of instructions multiplied by their iteration count: FMA, ADD, SUB, DIV, DP, MUL, ATAN, FPREM, TAN, SIN, COS, SQRT, SUB, RCP, RSQRT, EXP, VSCALE, MAX, MIN, ABS, IMUL, IDIV, FIDIVR, CMP, VREDUCE, VRNDInteger operations (INTOP)are calculated in two modes:
- Potential INT operations (default)that include loop counter operations that are not strictly calculations (for example, INC/DEC, shift, rotate operations). In this case, INTOP is a sum of the following instructions multiplied by their iteration count: ADD, ADC, SUB, MUL, IMUL, DIV, IDIV, INC/DEC, shifts, rotates
- Strict INT operations (available in Python* API only)that include only calculation operations. In this case, INTOP is a sum of the following instructions multiplied by their iteration count: ADD, MUL, IDIV, SUB