Address Compute Capacity Bottlenecks
- Arithmetic Intensity (the x-axis of theRooflinechart) = Floating-point operations per byte accessed. Any given algorithm has an arithmetic Intensity. In theory, optimization should not change this metric because it is a trait of the algorithm itself. So dots on aRooflinechart move up and down as performance changes, but rarely side to side.
- Optimizing a loop is not enough to make the corresponding dot rise to the next roofline; a loop must makegooduse of the optimization. Inefficient vectorization is not good enough; an isolated fused multiply-add instruction (FMA) is not good enough.
- In the right circumstances, you can use data layout and memory access optimizations to overcome both compute capacity and memory bandwidth limitations.
- Take advantage of code-specifichow-can-I-fix-this-issue?advice in theRecommendationstab.
Open a Result Snapshot
- If you prefer to work in the standalone GUI, from theFilemenu, choose and choose theResult2.advixeexpzresult.
- If you prefer to work in the Visual Studio* IDE, from theFilemenu, choose and choose theResult2.advixeexpzresult.
Focus the Roofline Chart on the Data of Most Interest
- On theIntel Advisortoolbar, click theLoops And Functionsfilter drop-down and chooseLoops.
- In theRooflinechart:
- Select theUse Single-Threaded Loopscheckbox.
- Click the control, then deselect theVisibilitycheckbox for allSP...roofs. (All variables in this sample code are double-precision, so there is no need to clutter the chart with single-precision rooflines.)In thePoint Colorizationsection, chooseVectiorized/Scalarto differentiate dot colors by scalar (blue) vs. vectorized (orange) instead of runtime (red, yellow, and green).Click to save your changes.
- Click the control. In the x-axis fields, backspace over the existing values and enter 0.1 and 0.8. In the y-axis fields, backspace over the existing values and enter 3.1 and 45.5. Click the button to save your changes.
Interpret Roofline Chart Data
Scalar Loop (Blue Dot)
Vectorized Loop (Bottom Orange Dot)
Self FLOP Per Iteration
Data Transfers: Total Gigabytes:
Data Transfers: Bytes Per Loop Iteration