Tutorial

  • 2021.1
  • 12/04/2020
  • Public Content

Address Memory Bandwidth Bottlenecks

This topic is part of a
tutorial
that shows how to use the automated
Roofline
chart to make prioritized optimization decisions.
Perform the following steps:
Key take-aways from these steps:
  • Memory bandwidth bottlenecks are generally overcome with cache optimizations.
  • Check data in other
    Intel Advisor
    views to support your
    Roofline
    chart interpretation.
These steps use a prepackaged analysis result because of tutorial duration and hardware dependency considerations.

Open a Result Snapshot

Do one of the following:
  • If you prefer to work in the standalone GUI, from the
    File
    menu, choose
    Open
    Result
    and choose the
    Result1.advixeexpz
    result.
  • If you prefer to work in the Visual Studio* IDE, from the
    File
    menu, choose
    Open
    File
    and choose the
    Result1.advixeexpz
    result.

Focus the Roofline Chart on the Data of Most Interest

  1. Use the display toggles to show the
    Roofline
    chart and
    Survey Report
    side by side.
  2. On the
    Intel Advisor
    toolbar, click the
    Loops And Functions
    filter drop-down and choose
    Loops
    .
    Intel Advisor: Filters
  3. In the
    Roofline
    chart:
    • Select the
      Use Single-Threaded Loops
      checkbox.
    • Click the Intel Advisor: Roofline menu 
						control, then deselect the
      Visibility
      checkbox for all
      SP...
      roofs. (All variables in this sample code are double-precision, so there is no need to clutter the chart with single-precision rooflines.)
      Intel Advisor: Roofline Menu
      In the
      Point Colorization
      section, choose
      Colors of Point Weight Ranges
      to differentiate dot colors by runtime (red, yellow, and green).
      Click Intel Advisor: Control 
						to save your changes.
    • Click the Intel Advisor: Roofline numerical zoom control 
						control. In the x-axis fields, backspace over the existing values and enter 0.1 and 0.4. In the y-axis fields, backspace over the existing values and enter 7.4 and 45.5. Click the Intel Advisor: Save control 
						button to save your changes.

Interpret Roofline Chart Data

Intel Advisor: Roofline chart and Survey Report
In the
Roofline
chart, notice the dot representing the loop in
main
at
roofline.cpp:295
(the lower dot): It is positioned above the (offscreen)
Scalar Add Peak
roofline, and on the
L2 Bandwidth
roofline.
Why is the dot positioned there?
The probable answer: Loop performance is limited by a memory bandwidth bottleneck involving L2 cache.
How can we verify this?
  1. Check the
    Survey Report
    :
    • Notice the
      Vectorized Loops/Efficiency
      value for the loop in
      main
      at
      roofline.cpp:295
      : 100%.
      This 100% vectorization efficiency is why the dot is above the offscreen
      Scalar Add Peak
      roofline.
    • Click the data row for the loop in
      main
      at
      roofline.cpp:295
      to view the associated source code in the
      Source
      tab.
  2. In the
    Source
    tab, scroll to source code lines 89-96 to view the associated data structure definition: Structure of Arrays (SOA).
    Intel Advisor: Source Tab
SOA is a good data layout for vectorization efficiency; however, our familiarity with the sample code tells us this data layout is preventing the tutorial dataset from fitting into L1 cache and causing many loads from L2 cache. (For details on why this is happening, check out this video: Roofline Analysis in Intel® Advisor 2017.)
So the loop in
main
at
roofline.cpp:295
is positioned on the
L2 Bandwidth
roofline because loop performance is indeed limited by a memory bandwidth bottleneck involving L2 cache.
How can we eliminate this memory bandwidth bottleneck?
Reorganizing code to optimize cache usage is a possible optimization technique.
The loop in
main
at
roofline.cpp:310
does this very thing, which is why the corresponding dot (upper dot in the
Roofline
chart) is positioned above the
L2 Bandwidth
roofline:
  1. In the
    Survey Report
    , click the data row for the loop in
    main
    at
    roofline.cpp:310
    .
  2. In the
    Source
    tab, scroll to code lines 97-101 to view the data structure definition for this loop: Array of Structure of Arrays (AOSOA). When the loop in
    main
    at
    roofline.cpp:310
    is in the AOSOA data layout, our familiarity with the sample code tells us the tutorial workload is split into two steps, and each step has a dataset that fits into L1 cache.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.