Analyze CPU Roofline
- What is the maximum achievable performance with your current hardware resources?
- Does your application work optimally on current hardware resources?
- If not, what are the best candidates for optimization?
- Is memory bandwidth or compute capacity limiting performance for each optimization candidate?
How It Works
- Collect loop/function timings using theSurveyanalysis.
- Collect floating-point and/or integer operations data, memory traffic data, and measure the hardware limitations of your hardware using theFLOPanalysis in theCharacterizationstep.At this step,Intel® Advisorcollects:
This collection can take three to four times longer than the Survey analysis.
- Computeoperations (floating-point operations (FLOP) and integer operations (INTOP)):
- FLOPis calculated as a sum of the following classes of instructions multiplied by their iteration count: FMA, ADD, SUB, DIV, DP, MUL, ATAN, FPREM, TAN, SIN, COS, SQRT, SUB, RCP, RSQRT, EXP, VSCALE, MAX, MIN, ABS, IMUL, IDIV, FIDIVR, CMP, VREDUCE, VRND
- INTOPis calculated by default as a sum of the following classes of instructions multiplied by their iteration count:ADD, ADC, SUB, MUL, IMUL, DIV, IDIV, INC/DEC, shifts, rotates.
- Memory trafficdata that is calculated as a product of memory operations and the amount of bytes in the register accessed by the function/loop. For memory traffic calculation,Intel Advisorcounts the following classes of memory instructions:
- scalar and vector MOV instructions
- GATHER/SCATTER instructions
- VBMI2 compress/expand instructions
CPU Roofline Report
- Arithmetic intensity (x axis) - measured in number of floating-point operations (FLOPs) and/or integer operations (INTOPs) per byte, based on the loop/function algorithm, transferred between CPU/VPU and memory
- Performance (y axis) - measured in billions of floating-point operations per second (GFLOPS) and/or billions of integer operations per second (GINTOPS)