CPU / Memory Roofline Insights perspective enables you to visualize actual performance against hardware-imposed performance ceilings, as well as determine the main limiting factor (memory bandwidth or compute capacity).
There are two ways to run the CPU / Memory Roofline Insights perspective: from the Intel® Advisor GUI and from CLI. Intel Advisor enables you to open results collected using both methods in the GUI.
Run CPU / Memory Roofline Insights Perspective from Intel® Advisor GUI
In the Analysis Workflow pane, the drop-down menu to select the CPU / Memory Roofline Insights perspective, set data collection accuracy level to Low, and click the button to run it. At this accuracy level, Intel Advisor:
- Measures the hardware limitations of your machine and collects loop/function timings using the Survey analysis.
- Collects floating-point and integer operations data, and memory data using the Characterization analysis.
For details about data collection accuracy presets, see Intel Advisor User Guide: CPU Roofline Accuracy Presets. Upon completion, Intel Advisor displays a Roofline chart.
The Roofline chart plots an application's achieved performance and arithmetic intensity against the machine's maximum achievable performance:
- Arithmetic intensity (x axis) - measured in number of floating-point operations (FLOPs) and/or integer operations (INTOPs) per byte, based on the loop/function algorithm, transferred between CPU/VPU and memory.
- Performance (y axis) - measured in billions of floating-point operations per second (GFLOPS) and/or billions of integer operations per second (GINTOPS).
Run CPU / Memory Roofline Insights Perspective from Command Line Interface
To run CPU / Memory Roofline Insights perspective using advisor command line interface, use the following command:
advisor --collect=roofline --project-dir=./advi --search-dir src:p=./advi –- myApplication
This command is a batch mode that runs two analyses one by one:
- Survey analysis that collects loops/functions execution time data.
- Characterization analysis that collects floating-point and integer operations, memory traffic and mask utilization metrics for AVX-512 platforms to measure arithmetic intensity and performance of your application, and compute capacity of your hardware.
To view the achieved performance of your application against hardware-imposed performance ceilings on an interactive Roofline chart, open the collected results in the Intel Advisor GUI or use the following command to generate an interactive HTML Roofline report:
advisor --report=roofline --report-output=./advi/advisor-roofline.html --project-dir=./advi
Where report-output option specifies the directory and the HTML file into which Intel Advisor saves the generated report.
For details about generating CLI reports, see the respective section in the Intel Advisor User Guide or use the following command in your terminal:
advisor --help report
Intel Advisor enables you to create a read-only result snapshot using the following command:
advisor --snapshot --project-dir=./advi --pack --cache-sources --cache-binaries -- /tmp/my_proj_snapshot
If one or more loops is not vectorizing properly and performance is unsatisfactory:
- Consider working with the most time-consuming function/loop indicated on a Roofline chart.
- Use the Code Analytics tab to examine the main information for the selected function/loop. Refer to the Roofline pane to identify whether the function/loop is compute or memory bound.
- Use Recommendations tab to view hints on possible optimization steps for the selected function/loop in the Roofline Guidance section.
- If your loop is compute bound:
- Check the Vectorized Loops/Efficiency values in the Survey Report.
- Consider running Dependencies analysis to discover why the compiler assumed a dependency and did not vectorize the selected function/loop.
- Consider running Memory Access Patterns (MAP) analysis to identify expensive memory instructions.
- If your loop is memory bound: