Intel® Fortran Compiler Classic and Intel® Fortran Compiler Developer Guide and Reference

ID 767251
Date 3/22/2024
Public
Document Table of Contents

Interprocedural Optimization

Interprocedural Optimization (IPO) is an automatic, multi-step process that allows the compiler to analyze your code to determine where you can benefit from specific optimizations.

The compiler may apply the following optimizations:

  • Address-taken analysis

  • Array dimension padding

  • Alias analysis

  • Automatic array transposition

  • Automatic memory pool formation

  • Common block variable coalescing

  • Common block splitting

  • Constant propagation

  • Dead call deletion

  • Dead formal argument elimination

  • Dead function elimination

  • Formal parameter alignment analysis

  • Forward substitution

  • Indirect call conversion

  • Inlining

  • Mod/ref analysis

  • Partial dead call elimination

  • Passing arguments in registers to optimize calls and register usage

  • Points-to analysis

  • Routine key-attribute propagation

  • Specialization

  • Stack frame alignment

  • Structure splitting and field reordering

  • Symbol table data promotion

  • Un-referenced variable removal

  • Whole program analysis

IPO Compilation Models

IPO supports two compilation models: single-file compilation and multi-file compilation.

Single-file compilation uses the [Q]ip compiler option, and results in one, real object file for each source file being compiled. During single-file compilation the compiler performs inline function expansion for calls to procedures defined within the current source file.

The compiler performs some single-file interprocedural optimization at the O2 default optimization level; additionally the compiler may perform some inlining for the O1 optimization level, such as inlining functions marked with inlining directives.

Multi-file compilation uses the [Q]ipo option, and results in one or more mock object files rather than normal object files. (See the Compilation section below for information about mock object files.) Additionally, the compiler collects information from the individual source files that make up the program. Using this information, the compiler performs optimizations across functions and procedures in different source files.

NOTE:

Inlining and other optimizations are improved by profile information. For a description of how to use IPO with profile information for further optimization, see Profile an Application.

Compile with IPO

As each source file is compiled with IPO, the compiler stores an intermediate representation (IR) of the source code in a mock object file. The mock object files contain the IR instead of the normal object code. Mock object files can be ten times or more larger than the size of normal object files.

During the IPO compilation phase only the mock object files are visible.

Link with IPO

When you link with the [Q]ipo compiler option the compiler is invoked a final time. The compiler performs IPO across all mock object files. The mock objects must be linked with the compiler or by using Intel® linking tools (or LLVM linking tools with ifx). While linking with IPO, the compiler and other linking tools compile mock object files as well as invoke the real/true object files linkers provided on the user's platform.

Link-time optimization using the -ffat-lto-objects compiler option is provided for GCC compatibility. During IPO compilation, you can specify -ffat-lto-objects option, for the compiler to generate a fat link-time optimization (LTO) object that has both a real/true object and a discardable intermediate language section. This enables both link-time optimization linking and normal linking.

You can specify the -fno-fat-lto-objects option for the compiler to generate a link-time optimization object that only has a discardable intermediate language section; no real/true object is generated. These files are inserted into archives in the form in which they were created. Using this option may improve compilation time and save space for objects.

If you use ld rather than xild to link objects or ar instead of xiar to create an archive, the real/true object, generated during fat link-time optimization guarantees that there will be no impediment to linking/building the archive. However, cross-file optimizations are lost in this case. The extra true object also takes additional space and takes compile time to generate it, so using -fno-fat-lto-objects compiler option is an advantage provided that you link the IPO mock object files with xild and archive them with xiar.

Whole Program Analysis

The compiler supports a large number of IPO optimizations that can be applied or have its effectiveness greatly increased when the whole program condition is satisfied.

During the analysis process, the compiler reads all Intermediate Representation (IR) in the mock file, object files, and library files to determine if all references are resolved and whether or not a given symbol is defined in a mock object file. Symbols that are included in the IR in a mock object file for both data and functions are candidates for manipulation based on the results of whole program analysis.

There are two types of whole program analysis - object reader method and table method. Most optimizations can be applied if either type of whole program analysis determines that the whole program conditions exists; however, some optimizations require the results of the object reader method, and some optimizations require the results of table method.

Object reader method

In the object reader method, the object reader emulates the behavior of the native linker and attempts to resolve the symbols in the application. If all symbols are resolved, the whole program condition is satisfied. This type of whole program analysis is more likely to detect the whole program condition.

Table method

In the table method the compiler analyzes the mock object files and generates a call-graph.

The compiler contains detailed tables about all of the functions for all important language-specific libraries, like the Fortran runtime libraries. In this second method, the compiler constructs a call-graph for the application. The compiler then compares the function table and application call-graph. For each unresolved function in the call-graph, the compiler attempts to resolve the calls by finding an entry for each unresolved function in the compiler tables. If the compiler can resolve the functions call, the whole program condition exists.