This document contains our customers' most Frequently Asked Questions about the Roofline feature of Intel® Advisor.
- What is Intel® Advisor Roofline analysis?
-
Intel® Advisor is a vectorization and threading tool, designed to help you locate and overcome performance bottlenecks in your programs. Roofline is one of the many tools provided by Advisor to assist you in this.
Advisor’s Roofline Analysis is an automatically-generated cache-aware roofline chart – a method of visually representing a program’s current and potential performance in relation to hardware-imposed limitations like memory bandwidth and compute capacity, as proposed in a paper from the University of California at Berkeley and expanded upon in a paper from the Technical University of Lisbon.
More detailed information on the nature and purpose of Roofline charts is available here.
- How do I use the Intel® Advisor Roofline analysis?
-
We have a getting started guide available here, and a seventeen-minute tutorial video here.
- How does Intel® Advisor Roofline analysis count FLOPS?
-
Advisor counts actual executed floating point instructions during the Trip Counts phase of analysis (the "per second" part of the metric is collected during the lightweight Survey analysis phase). It does not count single- and double-precision floating point operations separately. Floating point addition (+), subtraction (-), multiplication (*), and division (/) are counted as one operation each. A floating point fused multiply-add is considered two operations. For a SIMD instruction, it multiplies the operation count for the instruction by the number of vector elements used. It also takes into account the mask register state for AVX-512.
If a math function is an instruction, such as exp or sqrt, Intel® Advisor counts it as one operation. If it is a math library call, such as sin or cos, Intel® Advisor counts the actual number of compute instructions in Chebyshev/Taylor decomposition in the routine implementation.
It should be noted that FLOPS data is assigned to the function call itself and not to the loop calling the function. This is called selftime-based FLOPS counting. More information is available here.
- How does Intel® Advisor Roofline analysis calculate arithmetic intensity?
-
Intel® Advisor implements a cache-aware roofline model that is different from the classic roofline visual performance model. The primary difference involves computing operational intensity: all memory and cache transactions are counted, not just DDR transactions.
To expose cache usage effectiveness, the cache-aware roofline model also introduces separate rooflines for all cache levels. A loop positioned above a particular roofline on a roofline plot means that most of the loop data fits into caches of the appropriate level. In other words, the Intel Advisor counts memory operations analyzing actual instructions, not DRAM traffic.
This means that in cache-aware rooflines, arithmetic intensity is a property of a particular algorithm, and will not be affected by problem size changes or cache usage optimizations, unless the algorithm changes in the process.
Intel® Advisor may implement a classic roofline in future releases. If you need the classic roofline chart, contact vector_advisor@intel.com.
- Can I use Intel® Advisor Roofline on multithreaded programs?
-
Intel® Advisor Roofline analysis can calculate roof position, arithmetic intensity, and FLOPS for both multithreaded and single-threaded applications. Please note that multithreaded applications compute FLOPS per loop/function using the Elapsed Time metric, which can be viewed in the Survey report.
By default, the Roofline displays the roofs calculated for multithreaded applications. To view the roofs for single-threaded applications, simply check the Use Single-Threaded Roofs checkbox in the Roofline toolbar.
It is important to note that using MPI to run multiple ranks of a program is not the same thing as multithreading. Intel® Advisor, including the Roofline feature, collects results per-rank when used with MPI programs. To use Intel® Advisor with MPI, we recommend using the -gtool option for mpirun, as documented here.
To collect a Roofline analysis through the command line, as is necessary for MPI programs, simply run a Survey analysis, followed by a Trip Counts analysis with the -flops-and-masks option.
When viewing the results of collection on an MPI program, use the single-threaded roofs unless each rank is itself multithreaded.
- How do I run Roofline analysis on the command line?
-
The Roofline analysis is simply a survey collection followed by a trip counts collection with flops enabled. Remember to source the advixe-vars.sh or run the advixe-vars.bat file (depending on your operating system) that comes with Intel® Advisor before attempting to use it on the command line.
If you are using Intel® Advisor 2017 Update 1, you will also need to make sure that the ADVIXE_EXPERIMENTAL=roofline environment variable is set before collecting or viewing data. Setting the environment variable is no longer necessary in later updates, beginning with Intel® Advisor 2017 Update 2.
To run the Roofline analysis on the command line, use the following commands:
advixe-cl -collect survey -project-dir yourProjectDirectory -- yourProgramAndParameters advixe-cl -collect tripcounts -flops-and-masks -project-dir yourProjectDirectory -- yourProgramAndParameters
You can view the results in the GUI, or report the data in a file using the following command line:
advixe-cl --report survey --project-dir yourProjectDirectory --show-all-columns --format=csv --report-output file.csv
You do not see Roofline data if you omit the --show-all-columns option. The report column names of interest are GFLOPS and AI. However, this will only present you with numbers; the chart itself is only viewable in a GUI.
- Can I use the Intel® Advisor command line interface to specify particular loops for Roofline analysis?
-
Not at this time. Contact vector_advisor@intel.com if you feel this is a critical feature. However, there are plenty of powerful filtering tools in Intel® Advisor, which will allow you to show and hide loops according to a variety of criteria.