An Overview of the Intel IA-64 Compiler (continued)


Previous Next     Page 10 of 15

SCALAR OPTIMIZATIONS

A primary objective of scalar optimizations is to minimize the number of computations and the number of references to memory. Partial redundancy elimination (PRE) [1, 2, 11] is a well known scalar optimization technique that subsumes global common subexpression elimination (CSE) and loop invariant code motion. CSE removes expressions that are always redundant (redundant on all control flow paths). PRE goes beyond CSE by attempting to remove redundancies that occur only on some control flow paths. In this paper, we highlight the use of scalar optimizations to eliminate loads and stores.

Traditional PRE
An expression at program point p in the program control flow graph (CFG) is fully redundant if the same expression is already available. An expression e is said to be available at a point p if along every control flow path from the program entry to p there is an instance of e that is not subsequently killed by a redefinition of its operands. Figure 15 shows an example of a fully redundant expression and its elimination by CSE. The redundancy is removed by saving the value of the redundant expression in a temporary variable and then later reusing that value instead of reevaluating the expression.

Figure 15

Figure 15: (a) expression a + b is fully available, (b) elimination of common subexpression

An expression e is partially available at a point p if there is an instance of e along only some of the control flow paths from the program entry to p. Figure 16 shows an example of a partially redundant expression and PRE. The partial redundancy is removed by inserting a copy of the redundant expression on the control flow paths where it is not available, making it fully redundant.

Figure 16

Figure 16: (a) expression a + b is partially available
(b) elimination of partial redundancy

PRE can move the loop invariant to outside the loop as shown in Figure 17. The expression *q is available on the loop back-edge, but not on entry to the loop. After inserting t2 = *q in the loop preheader, *q is fully available and can be removed from the loop.

Figure 17

Figure 17: Example of loop-invariant code motion

Note, however, that the optimizer must be careful not to insert a copy of an expression at a point that would cause the expression to be evaluated when it was not evaluated in the original source code. Figure 18 shows such an example. The insertion of an expression e at a program point p is said to be down-safe if along every control flow path from p to the program exit there is an instance of e such that the inserted expression is available at each later instance. In Figure 18, the insertion of t1 = *q is not down-safe. There are two aspects to down-safety. The first is that an unsafe insertion may create an incorrect program. For example, in Figure 18, the expression *q is executed before checking if q is a null pointer. Second, an unsafe insertion reduces the number of instructions along one path at the expense of another path. In Figure 18, the redundancy is eliminated for the left-most path, but an extra instruction, t1 = *q, is executed on the right-most path.

Figure 18

Figure 18: (a) expression *q is partially available, (b) violation of down-safety

Extended PRE for IA-64
The standard PRE algorithm removes all the redundancies possible with safe insertions. We have extended PRE to use control speculation to remove redundancies on one control flow path, perhaps at the expense of another, less important control flow path. In the example in Figure 18, assume that the left-most control flow path is executed much more frequently than the right-most path. If the redundancy on the left-most path could be removed without producing an incorrect program, overall performance would be improved even though an extra instruction is executed on the right-most path.

Figure 19

Figure 19: Redundancy elimination using control speculation

Figure 19 shows how the redundancy in Figure 18 could be removed using the IA-64 support for control speculation. The insertion of *q is done using a speculative load, and a check instruction is added in place of the redundant load. Executing the check is preferable to executing the redundant load because the check does not use memory system resources and because the latency of the load is hidden by executing it earlier. Also, elimination of the redundant load may expose further opportunities for redundancy elimination in the instructions that depend on the load.

Removal of redundant loads can sometimes be inhibited by intervening stores. In Figure 20 (a), the loop-invariant load *p cannot be removed unless the compiler can prove that the store *q does not access the same memory location. The process of determining whether or not two memory references access the same location is called memory disambiguation and was described earlier in this paper.

If the compiler can determine that there is an unknown, but small probability that *p and *q access the same memory location, the loop invariant load and the add that depends on it can be removed using the IA-64 support for data speculation as shown in Figure 20 (b). The insertion of *p in the preheader is done using an advanced load, and a check instruction is added in place of the original redundant load. If the store *q accesses the same memory location as the load *p, a branch to a recovery code block will be taken at the check instruction. The recovery block contains code to reload *p and re-execute t4=t2 + t3. If the store *q and load *p access different memory locations, then only the check is executed instead of the redundant load and add.

Figure 20

Figure 20: Removal of loop-invariant load using data speculation

Partial Dead Store Elimination
In contrast to PRE which removes redundant loads, Partial Dead Store Elimination (PDSE) removes redundant stores in the program. Figure 21 shows an example of PDSE. The partial redundancy is removed by inserting a copy of the partially dead store into the control flow paths where it is not dead, making it fully redundant.

Figure 21

Figure 21: (a) store *p is partially dead, (b) elimination of partially dead store

As with PRE, the compiler must be careful when inserting stores to avoid executing a store when it should not be executed. Figure 22 shows an example of an incorrect insertion of a store. In Figure 22b, the store *p = t1 on the right is executed even if the path containing d=a+b is executed. In the original program in Figure 22a, no store to *p is executed when the path containing d=a+b is executed.

Figure 22

Figure 22: Example of incorrect insertion of a store

Figure 23 shows how the redundancy in Figure 22 could be removed using a predicated store. In Figure 23b, the redundancy on the left-most path is removed by inserting a predicated store. Instructions are required to set the predicate p2 to 1 when the store should be executed, and to 0 when it should not be executed. In Figure 23b, suppose that the left-most path is executed much more frequently than the right-most path. On the left-most path, executing p2=1 is preferable to executing the store, because the p2=1 does not use memory system resources. In some cases, an appropriate instruction to set p2 may already exist as a result of the if-conversion or another optimization, thereby reducing the cost of predicating the store.

Figure 23

Figure 23: Elimination of partially dead store using predication




Previous Next     Page 10 of 15