Address Memory Bandwidth Bottlenecks
- Memory bandwidth bottlenecks are generally overcome with cache optimizations.
- Check data in otherIntel Advisorviews to support yourRooflinechart interpretation.
Open a Result Snapshot
- If you prefer to work in the standalone GUI, from theFilemenu, choose and choose theResult1.advixeexpzresult.
- If you prefer to work in the Visual Studio* IDE, from theFilemenu, choose and choose theResult1.advixeexpzresult.
Focus the Roofline Chart on the Data of Most Interest
- On theIntel Advisortoolbar, click theLoops And Functionsfilter drop-down and chooseLoops.
- In theRooflinechart:
- Select theUse Single-Threaded Loopscheckbox.
- Click the control, then deselect theVisibilitycheckbox for allSP...roofs. (All variables in this sample code are double-precision, so there is no need to clutter the chart with single-precision rooflines.)In thePoint Colorizationsection, chooseColors of Point Weight Rangesto differentiate dot colors by runtime (red, yellow, and green).Click to save your changes.
- Click the control. In the x-axis fields, backspace over the existing values and enter 0.1 and 0.4. In the y-axis fields, backspace over the existing values and enter 7.4 and 45.5. Click the button to save your changes.
Interpret Roofline Chart Data
- Check theSurvey Report:
- Notice theVectorized Loops/Efficiencyvalue for the loop inmainatroofline.cpp:295: 100%.This 100% vectorization efficiency is why the dot is above the offscreenScalar Add Peakroofline.
- Click the data row for the loop inmainatroofline.cpp:295to view the associated source code in theSourcetab.
- In theSourcetab, scroll to source code lines 89-96 to view the associated data structure definition: Structure of Arrays (SOA).
- In theSurvey Report, click the data row for the loop inmainatroofline.cpp:310.
- In theSourcetab, scroll to code lines 97-101 to view the data structure definition for this loop: Array of Structure of Arrays (AOSOA). When the loop inmainatroofline.cpp:310is in the AOSOA data layout, our familiarity with the sample code tells us the tutorial workload is split into two steps, and each step has a dataset that fits into L1 cache.