Published: 01/27/2020

Last Updated: 01/27/2020

This technical documentation expands on the information in the Load Value Injection (LVI) disclosure overview for software developers

Note that this documentation will use more precise (but different) terminology for transient execution side channel methods than we have used in past documents.

Be sure to review the updated terminology guide and the list of affected processors.

## Overview of Load Value Injection

On some processors, faulting or assisting1 load operations may transiently receive data from a microarchitectural buffer2.

If an adversary can cause a specified victim load to fault, assist, or abort, the adversary may be able to select the data to have forwarded to dependent operations by the faulting/assisting/aborting load. For certain code sequences, those dependent operations may create a covert channel with data of interest to the adversary. The adversary may then be able to infer the data's value through analyzing the covert channel. This transient execution attack3 is called load value injection (LVI) and is an example of a cross-domain transient execution attack.

Because LVI methods requires several complex steps4 to be chained together when the victim is executing, it is primarily applicable to synthetic victim code developed by researchers or attacks against Intel® Software Guard Extensions (Intel® SGX) by malicious operating systems (OSes) or virtual machine managers (VMMs). LVI has been assigned CVE-2020-0551 with a base score of 5.6 Medium, CVSS:3.1/AV:L/AC:H/PR:L/UI:N/S:C/C:H/I:N/A:N.

There are four types of hardware behavior for which we will discuss LVI applicability: LVI stale data, LVI zero data, Zero-at-ret, and No forwarding.

The types of hardware behavior that might allow LVI methods are:

• LVI stale data: Forwarding stale data from a faulting load to dependent instructions. This behavior is known as LVI stale data. It will be discussed further in subsequent sections. In certain situations, processors affected by LVI stale data may also forward zero values from faulting loads.
• LVI zero data: Forwarding only zero values from a faulting load to dependent instructions. This behavior is known as LVI zero data. This vulnerability is primarily only applicable to efforts to attack Intel® Software Guard Extensions (Intel® SGX) enclaves, although contrived victim code in other environments may also be vulnerable.

The types of hardware behavior on processors not affected by LVI are:

• Zero-at-ret: Forwarding of zero values from faulting loads to dependent instructions only when the load is ready to retire. This behavior is known as Zero-at-ret. According to our assessment, this behavior is not exploitable in practice.
• No forwarding: No forwarding of values to dependent instructions from faulting loads. This behavior is known as No forwarding. This behavior prevents LVI.

Due to the numerous, complex requirements that must be satisfied to implement the LVI method successfully, LVI is not a practical exploit in real-world environments where the OS and VMM are trusted. Because of Intel SGX's strong adversary model, attacks on Intel SGX enclaves loosen some of these requirements. Notably, the strong adversary model of Intel SGX assumes that the OS or VMM may be malicious, and therefore the adversary may manipulate the victim enclave's page tables to cause arbitrary enclave loads to fault or assist. Where the OS and VMM are not malicious, LVI attacks are significantly more difficult to perform, even against Intel SGX enclaves. Accordingly, system administrators and application developers should carefully consider the particular threat model applicable to their systems when deciding whether and where to mitigate LVI.

## Steps and Elements to Cause LVI

There are three, multi-step LVI methods that malicious actors could potentially use to infer secret data or linear memory addresses from a victim's application's load operations.

Note that the first two LVI methods require the attacker to be able to provide input to the victim application that will be read from memory and written to memory.

All three methods require the adversary to perform surveillance and find a suitable code sequence in the victim's program that satisfies all of the requirements, enumerated in each section below.

The first method is to use LVI in conjunction with pre-existing patterns in the victim code as a universal read gadget that allows the attacker to select which values in the victim's memory it wishes to infer.

1. Prime data: The attacker-provided input (an address or index consumed by the victim's pre-existing code, which points to a secret in the victim's address space) is written to a speculative microarchitecture data source5. For example, during a store to memory, the input can be written into a store buffer6.
2. Trigger fault/abort/assist: Cause a victim load to fault, or to trigger an assist or Intel® Transactional Synchronization Extensions (Intel® TSX) abort. This opens a small transient execution window within which a malicious actor could potentially mount an LVI method. This fault/assist/abort must be triggered in the victim code, which makes this step significantly more difficult for an adversary than alternative methods which trigger the fault/assist/abort in the malicious code. The methods that an adversary can use to induce faults/assists/aborts in the victim, and the difficulty in such methods, is described in the Triggering a Fault, Assist or Abort section.
3. Inject data, read secret: Locate another subsequent load within the transient execution window that depends on the result of the faulting/assisting/aborting load in step 2. Stale or incorrect data that is derived from the adversary-provided input from step 1 may be forwarded7 to the address base or index operand of this load.

Note that, depending on the microarchitectural data source, there may be additional constraints that must be satisfied for the attacker's data to be forwarded. For example, if the data source is a line fill buffer, then another prior load (still within the transient execution window) whose physical address matches that of the injection-target load, and that missed in the data cache, must have allocated the same fill buffer that was used to prime the attacker's data. If all constraints are satisfied, then this load may read a secret from the injected address.

4. Transmit secret: Locate another subsequent instruction that can transmit data through a covert channel (for example, a load, store, or indirect/conditional branch) within the transient execution window, and which depends on the load in step 3. An adversary may need to scan the victim's code to look for an appropriate disclosure gadget that performs steps 3 and 4 before mis-speculation is resolved. This attack requires steps 3 and 4 to execute in a single transient window using the forwarded data from step 1. Not all victim code will contain realistic disclosure gadgets for these steps.

*b = a;           // Prime attacker data 'a' in a store buffer
d = *c;           // Load faults, attacker data 'a' forwarded to 'd'
leak = oracle[e * 4096]; // Transmit secret over covert channel


Note that steps 2, 3 and 4 are referred to later as the "Load+Load+Transmit" pattern.

The second method is to use LVI to redirect transient control flow to jump to other code inside the victim that reads and transmits a secret:

1. Prime data: The attacker-provided input (an address consumed by the victim's pre-existing code, which points to a target code gadget in the victim's address space) is written to a speculative microarchitecture data source. A list of microarchitectural structures that could potentially be primed with malicious data is given in the Speculative Microarchitectural Data Sources section. Many pieces of software do not deal with data from malicious applications and may be more difficult for the attacker to prime data.
2. Trigger fault/assist/abort: Cause a victim load to fault, or to trigger an assist or Intel® Transactional Synchronization Extensions (Intel® TSX) abort. This opens a small transient execution window within which a malicious actor could potentially mount an LVI method. This fault/assist/abort must be triggered in the victim code, which makes this step significantly more difficult for an adversary than alternative methods which trigger the fault/assist/abort in the malicious code. The methods that an adversary can use to induce faults/assists/aborts in the victim, and the difficulty in such methods, is described in the Triggering a Fault, Assist or Abort section.
3. Inject data, hijack control flow: There is a subsequent branching instruction (for example, an indirect call/jump) within the transient execution window that depends on the result of the faulting/assisting/aborting load in step 2. The adversary-provided input from step 1 may be forwarded to the address operand of this branching instruction. Hence, the branching instruction may redirect the instruction pointer to the adversary-provided address.

Note that the same constraints on data forwarding that apply to the universal read gadget above also apply here.

Another technique is to overwrite the victim code's stack pointer, typically through LVI zero data, and to cause the RET instruction to retrieve the RIP value from the stack memory under the attacker's control. This stack pointer hijacking method is primarily relevant to Intel SGX enclaves.

4. Read and transmit secret: Locate a target code sequence that can read and transmit a secret over a covert channel within the transient execution window. An adversary may need to scan the victim code to look for an appropriate disclosure gadget that performs steps 3 and 4 before transient execution is squashed. This attack requires steps 3 and 4 to execute in a single transient window. Not all victim code will contain realistic disclosure gadgets for these steps.

Example victim code (LVI MSBDS control-flow hijacking gadget):

*b = a;    	// Prime attacker data 'a' in a store buffer
d = *c;    	// Load faults, attacker data 'a' forwarded to 'd'
d();       	// Branch to attacker-controlled address d


Note that steps 2 and 3 in the example above are also referred to later as the "Load+Branch" pattern.

There is also a third related method known as Load+Transmit that can allow an attacker to read non-arbitrary secrets from a victim application. This method is not exactly LVI, since it does not involve an injection of attacker data. Instead, it can be characterized as triggering MDS in the victim application so that the victim leaks a specific secret that a pre-existing portion of the victim code already loads or stores (a non-universal read gadget). The steps are as follows:

1. Identify store/load of victim secret: The attacker must find an instruction in the victim program that loads or stores a specific secret, thus making the secret available in a microarchitectural data source such as a fill buffer or store buffer8. The victim may have relatively few portions of its code that execute while the attacker-desired secret is in affected buffers.
2. Trigger fault/assist/abort: Cause a victim load to fault, or to trigger an assist or Intel® Transactional Synchronization Extensions (Intel® TSX) abort. This opens a small transient execution window within which a malicious actor could potentially mount an LVI method. This fault/assist/abort must be triggered in the victim code, which makes this step significantly more difficult for an adversary than alternative methods which trigger the fault/assist/abort in the malicious code. The methods that an adversary can use to induce faults/assists/aborts in the victim, and the difficulty in such methods, is described in the Triggering a Fault, Assist or Abort section.
3. Transmit secret: Locate a subsequent instruction that can transmit data through a covert channel (for example, a load, store, or indirect/conditional branch) within the transient execution window, and which depends on the load in step 2.

Note that the same constraints on data forwarding that apply to the universal read gadget described above also apply here. If all of the forwarding constraints are satisfied and the transient execution window is still open, then this instruction can be used to transmit the secret over a covert channel.

*b = s;      // Victim secret 's' allocated to a store buffer
d = *c;      // Load faults, victim secret 's' forwarded to 'd'
leak = oracle[d * 4096]; // Transmit secret over covert channel


Note that steps 2 and 3 in the above are also referred to later as the Load+Transmit pattern.

## Triggering a Fault, Assist or Abort

The previous section outlined the steps that the attacker must perform in attempting to implement an LVI method. All of these methods have the same high degree of complexity as the MDS-style methods that they utilize, in addition to requiring the additional, and extremely complicated, step of the attacker being able to trigger a fault, assist, or Intel® Transactional Synchronization Extensions (Intel® TSX) abort in the victim's context, while the victim program is executing, and on a specific load instruction in a section of victim code that meets all of the previously described requirements.

This section lists several methods that an attacker could use to attempt to trigger the fault/assist/abort.

### OS Access Using Application Pointer

A malicious application makes a system call to the OS and passes a parameter that requires the OS to access application memory. Then, the malicious application chooses a memory location for the parameter that would cause either a page fault or assist.

Note that hardware features such as Supervisor Mode Access Prevention (SMAP), as well as existing software mitigations for bounds check bypass (Spectre variant 1), may prevent malicious actors from triggering a fault/assist in this manner. Refer to the LVI Impact on OS/VMM section for more details.

### Induced Victim Memory Access in Application

A malicious application or guest can attempt to manipulate a victim's pointer such that the victim's usage of that pointer transiently signals a fault or assist.

There are two classes of induced access:

1. Access to data which is programatically valid. The data would have been allowed in both the transient and retired instructions.
2. Access to data which is programatically invalid. The data may have been accessed transiently but access would have been prevented by the program in the retired instruction stream.

Type confusion is an example of a programatically invalid access. It is possible for typed languages to transiently load from a number that is not a pointer. Such a load may use a non-canonical address and could thus receive incorrect data. These programmatically invalid accesses only occur transiently and might be made to any location in the address space. These can be mitigated with speculation control before the access, as described in the Typecasting and indirect calls section in Analyzing Potential Bounds Check Bypass Vulnerabilities.

A page fault that occurs when accessing a memory mapped file is an example of a programatically valid access. The memory location passes all validity checks and can occur in both the speculative and retired instruction streams.

Certain vector load instructions may also generate a fault when they are not aligned and may receive incorrect data. Because modern OSes do not use segment limits and applications rarely enable alignment checks, these faults are not useful to attackers.

### Clearing of Accessed or Present Bits Causing Memory Pressure

A malicious application or guest may reference enough memory to cause the OS or VMM to take actions to reclaim memory pages.

Note that causing the victim application or guest's memory to be paged out may be an undesirable outcome for the attacker since, on Intel® Core™ i processor family models, forwarding from store buffers does not occur when the page is marked not-present and forwarding from fill buffers or load port does not occur on such faults until the load is ready to retire.

So the attacker would want to cause just enough memory pressure so that the OS/VMM clears the accessed bit in the page table or extended page table (EPT) so that an assist occurs when the victim accesses memory.

Causing memory pressure on a victim requires sharing the OS's Least-Recently-Used (LRU) list with the victim, which is typically not the case when the victim runs in a different container, or in a different VM. It is also not possible when the targeted victim memory is pinned, as can be the case for many VMs and some applications.

It is theoretically possible for an application to cause memory pressure that results in the OS paging out or clearing A/D bits for OS data in a way that allows an LVI method to be possible. It is also theoretically possible for a guest application to cause memory pressure in a way that results in the VMM paging out or clearing EPT A/D bits for data belonging to the guest OS.

### ​Attacker Manipulation of Page Tables

This type of attack is of concern only to Intel SGX, as an Intel SGX enclave is the only environment where the adversary has the potential to directly control Intel® architecture (IA) or EPT page tables of the victim. The malicious OS/VMM can potentially arbitrarily manipulate the victim enclave's page tables to induce faults or assists on attacker-chosen enclave pages.

### Intel® TSX Abort

Regarding Intel TSX transactions, the vast majority of code execution is outside of Intel TSX transactions. However, if the victim program uses Intel TSX, then Intel TSX aborts are a possible avenue for an LVI method. In addition to the fault cases discussed above, Intel TSX aborts can potentially be caused by L1 cache evictions. Although conflicts typically cannot be caused by different applications, one exception is when a different application executes on a sibling thread on the same physical core. However, this case is mitigated by the MDS mitigations for simultaneous multithreading (SMT). Refer to Microarchitectural Data Sampling for more details.

## Speculative Microarchitectural Data Sources

During out-of-order execution of a load operation, the processor may speculatively select a value from a microarchitectural data source as the result of the load. If this speculative value matches the correct value, then subsequent speculative instructions that depend on this value may eventually retire. Otherwise, these (transient) speculative instructions will eventually be squashed. However, transient instructions that depend on a mis-speculated value may have microarchitectural side effects that can be observed via a covert channel.

### Injection of Non-zero Data (LVI Stale Data)

The following transient execution attacks may be used to enable the LVI method:

• Microarchitectural Data Sampling
• TSX Asynchronous Abort

Microarchitectural Data Sampling or TAA methods may cause faulting, assisting or aborting loads to receive the incorrect data from the fill buffers (MFBDS), store buffers (MSBDS), or load ports (MLPDS). Refer to Microarchitectural Data Sampling and Intel® Transactional Synchronization Extensions (Intel® TSX) Asynchronous Abort for background information.

#### MSBDS

An LVI method using MSBDS works through the attacker causing a victim load to fault, assist, or Intel TSX abort so that the load operation is transiently forwarded data that the attacker desires from a store buffer entry. This forwarded data might be a secret or the memory address of a secret that the attacker wants to infer. On the Intel Core i processor family, loads that fault due to not present pages or not present EPT pages (RWX are all 0), do not transiently forward store buffer data to dependent operations and thus cannot cause MSBDS-based LVI on such faulting loads.

Because the vast majority of code execution is outside of Intel TSX transactions, and because the number of loads done within Intel TSX is relatively low compared to software executing outside of an Intel TSX transaction, Intel TSX aborts that may cause MSBDS are expected to be less useful to an attacker attempting to use the LVI method. The most likely LVI vector using MSBDS from a non-system software attacker would be to cause the victim process to take a microarchitectural assist to update a paging accessed bit or cause a non-canonical address violation in a memory location that the attacker desires. Refer to the LVI Impact on OS/VMM section for further discussions of the applicability of using MSBDS to mount LVI methods outside of Intel SGX.

#### MFBDS

MFBDS may still occur without such direct control. This occurs when a fill buffer is allocated but the data portion of the fill buffer entry has not yet been updated and thus is stale. If a later assisting/faulting/aborting load matches the physical address of this newly allocated fill buffer, it may be forwarded the stale data, which may be of use to the adversary. Unlike with the MFBDS attack, an LVI method using MFBDS needs to induce this in victim code. Specifically, it needs a faulting/assisting/aborting load to hit a fill buffer entry that is currently in use by a non-faulting/assisting/aborting operation with the same physical address where that entry's data that not yet been updated and happens to be useful to the attacker. Inducing this in victim code is more complex than the MFBDS attack, which occurs within the attacker's code.

#### MLPDS

There are two causes for MLPDS:

There are a number of limitations for vector MLPDS based LVI attacks. A faulting/assisting/aborting vector load will only forward non-zero data in the upper bits of the vector register-the lower 64 bits will be zeroed. There are fewer victim code sequences that use vector registers in a way that create a covert channel based on their contents because pointers are generally dealt with using the general purpose registers instead of the vector registers. The data forwarded by MLPDS, the retained data on the load ports, is a small set of data. This makes it more difficult for malicious actors to cause their desired data to be forwarded to a MLPDS faulting operation, making exploitation of LVI using vector MLPDS even more difficult to exploit than other variants.

• A faulting/assisting/aborting vector (SSE/Intel® Advanced Vector Extensions (Intel® AVX)/Intel® Advanced Vector Extensions 512 (Intel® AVX-512)) load that is more than 64 bits in size.
• A faulting/assisting/aborting load which spans a 64-byte boundary.

A faulting/assisting/aborting load which spans a 64-byte boundary may also enable the conditions for MLPDS. The set of data which can be forwarded is small as discussed above for vector MLPDS LVI and there fewer of the victim code's loads are likely to split 64-byte boundaries.

Unlike MSBDS, MLPDS may be caused by not present faults or EPT violations on the Intel® Core™ processor family. Loads that take not present faults or EPT violations are not executed transiently, only at retirement. This creates a much smaller window of time for the disclosure gadget to execute and cause a covert channel and has a much more specific set of conditions to create an exploitable gadget (loads that split cache lines and have a disclosure gadget immediately following the load), both of which make split MLPDS LVI even more difficult to exploit than other variants.

#### L1TF & E2E

As with the MFBDS LVI method, an L1TF or E2E method can only inject data into a victim load from the same physical address. Because system software has direct control of the page tables, it may be able to put secrets or an attacker-desired linear address at the exact physical address of the faulting loads. In general, non-system software does not have direct control of page tables that map the victim and thus cannot do that. Thus, L1TF and E2E methods are primarily of concern with respect to system software attackers (for example, against Intel SGX enclaves).

### Injection of Zero

#### Load Value Injection Zero Data (LVI zero data)

Many processors may forward a fixed value of 0 to a faulting/assisting load's dependent instructions, for example when the targeted address is not present in the L1D cache. Some processors mitigate general cases of RDCL, L1TF, MDS, or TAA by forwarding a value of 0 to dependent operations of the load (instead of forwarding other data values that may contain secret data or be controlled by a malicious actor). Although this mitigation reduces the risk of an LVI method in typical OS environments, there are certain situations where an adversary injecting a value of 0 to dependent operations may lead to a victim transiently creating a covert channel desired by the adversary. Since mainstream OSes mark the low page containing address 0 as not present, this LVI zero data method is primarily relevant to Intel SGX enclaves with a system software adversary.

### Zero-at-ret

Some processors will generally forward a value of 0 to dependent operations, but only when the faulting load is the next operation to retire. This behavior is called Zero-at-ret. Such behavior ensures the processor will not transiently forward 0 to dependent operations before previous instructions have resolved (for example, before an older jump mispredicts). This significantly constrains speculation-only a few dependent operations will execute in the transient execution window.

Using Zero-at-ret to target and leak memory contents would require a dependency chain longer than allowed by the at-ret cancellation window, and therefore is impractical on processors with Zero-at-ret behavior. Accordingly, there are enormous difficulties to finding and exploiting a Zero-at-ret vulnerability in real-world production software9.

## LVI Impact

Unlike domain-bypass attacks like MDS or L1TF, where the attacker has direct control over the instructions executed, LVI is a cross-domain method and thus requires manipulating the victim code's behavior. As described in the Steps and elements to cause LVI section, the malicious actor needs to:

• Find existing gadgets in the victim software that meet all of the attack requirements.
• Influence the behavior of the victim's environment to cause execution of the gadget inside the victim.
• Influence the victim's execution so that a specific load inside the gadget takes a fault, assist or abort.
• Cause the transient execution to last long enough that the gadget puts the attacker-desired data into the covert channel.
• Look for the signal in the covert channel emitted by the LVI gadget through the background noise created by the system.

Needing to perform all these steps increases the complexity of the attack, beyond the already significant complexities present in other transient execution vulnerabilities

We describe the potential impacts for Intel SGX, OS/VMM and applications separately below:

### Intel® SGX

Intel SGX's threat model identifies all software running outside of an Intel SGX enclave as untrusted, including privileged OS (or hypervisor) software. In the context of LVI, an malicious OS can cause arbitrary loads to fault or assist during enclave execution by marking an enclave page as not present, and then resuming the enclave. The next time the enclave code attempts to load from any address within the page marked not present, the memory access will fault, and stale data or a value of 0 may be forwarded to dependent instructions.

As explained in the previous sections, on processors affected by L1TF or MDS, stale data might be forwarded to the faulting/assisting instruction if the specific conditions for stale data forwarding are met. On these processors, with the microcode mitigations for L1TF and MDS applied, any interrupt or exception (including the single-stepping timer interrupt generated by, for example, the SGX-Step tool10) in the attempt to modify the page tables at a specific moment flushes the L1D and the microarchitectural buffers that can be exploited by MDS. Therefore, malicious actors would need uninterrupted enclave execution between the instruction that created the stale data and the load instruction that might fault/assist to ensure the success of the stale data forwarding.

On processors that mitigate L1TF and MDS by forwarding a value of 0 to dependent operations, the value 0 is forwarded to the faulting/assisting load instruction instead of any stale data. This limits the scope of what the attacker may be able to achieve to the LVI zero data variant.

To construct an LVI exploit, forwarding stale data or a value of 0 to the faulting/assisting instruction is a necessary but insufficient requirement. The exploit must also make sure the dependent instructions inside the enclave access secret data and transmit the secret data through a covert channel, all within the transient execution window.

It is worth clarifying that in an environment where a malicious OS (or hypervisor) is not involved (for example, the platform owner does not intentionally load a malicious OS to attack an Intel SGX enclave, but instead the system was infested by unprivileged malware) it is much harder for an unprivileged attacker than for the malicious OS to mount a LVI attack on the Intel SGX enclave. The scenario of an unprivileged malware attacking an Intel SGX enclave should be considered a special case of an unprivileged malware attacking another application discussed in the Between different applications section later.

### OS/VMM

An unprivileged adversary has few points of leverage to induce faults or assists into code executing at a higher privilege level. OSes and VMMs that have already been mitigated against Spectre and L1TF/MDS will significantly reduce the risk of LVI attacks against the OS or VMM.

#### Impact to OS from Application (Including When Virtualized)

##### Kernel load from user page.

Refer to the OS access using application pointer section. The values of user-supplied parameters are not trusted by the kernel, and hence the ability to transiently inject arbitrary values does not supply the current process with any additional control of the kernel's speculative execution. Existing kernels should already be hardened against transient execution attacks on user application interfaces for Spectre variant 1.

If the OS makes use of Supervisor Mode Access Prevention (SMAP) on processors with SMAP enabled, then LVI on kernel load from user pages will be mitigated. This is because the CLAC and STAC instructions have LFENCE semantics on processors affected by LVI, and this serves as a speculation fence around kernel loads from user pages.

##### Paging of kernel

An OS that pages its own memory may provide more opportunities for malicious actors to find a gadget that follows code that takes an assist on a kernel page where the OS has cleared the accessed bit. But malicious actors have no control over when the OS may clear accessed bits, and the rate at which the OS does so is low.

##### Sandboxed kernel code

When executing sandboxed code in a kernel that relies on language based security, mitigations against other transient execution attacks (for example, bounds check bypassbranch target injection, and speculative store bypass) would greatly increase the difficulty of LVI methods, since LVI relies on similar code patterns as these methods. ​

### VMM from VM

Similar to the impact an application has on an OS, a VMM responding to VM calls by a guest can access guest-controlled addresses. Typically VMMs walk page tables in software, which doesn't allow faults while accessing the guest's memory. This makes it difficult for guests to cause faults in the hypervisor.

### Impact Between Guests in Virtualized Environments

OSes that do not page their own memory may be theoretically vulnerable to taking faults and assists while executing if they are running as a guest of a VMM that is clearing EPT accessed/present bits due to memory pressure. Pinned VMs, or VMs running with separated LRU lists in containers, are not impacted. Even for non-pinned VMs, the necessary attack scenario is very complex and is highly unlikely to be practical. Malicious actors would need to take similar steps as described in the application section below.

### Applications

#### Between Different Applications

Malicious applications may attempt to use LVI stale data11 with the attacker directly injecting data into internal CPU buffers to infer data of other applications. However, with the specified MDS mitigation applied on affected CPUs, internal CPU buffers are cleared on MD_CLEAR operations (including when switching to an application) and may be protected through appropriate SMT scheduling for sibling hyperthreads.

This implies that attackers cannot directly inject values for the prime data step in the Load+Load+Transmit and Load+Branch LVI variants on systems that already mitigate MDS. Malicious actors would instead need to rely on values already present in the victim process. These values might be present in the victim process because it is interacting with the attacker (for example, if the attacker is passing data to the victim process). If the attacker cannot inject data values into the victim's data, the attacker will not be able to accomplish the prime data step and thus cannot perform those LVI variants.

As discussed in the Speculative Microarchitectural Data Sources section, an LVI mechanism that avoids some of these restrictions for a non-system-software adversary (for example, a malicious application) is MSBDS on a paging accessed bit update assist. The methods to cause such an assist are detailed in the Triggering a Fault, Assist or Abort section.

A successful LVI stale data method on another application using paging accessed bits requires the following preconditions:

• The value the adversary wishes to inject is being used in the victim process.
• The victim executes code to inject the attacker-desired values shortly before the method.
• If the victim executes this code too long before the method then the value in the MDS-affected buffer will be overwritten by more recent victim code or cleared by VERW (for example, on context switch or system call).
• The page used by the victim load has its accessed bit cleared (which may be difficult if previous accesses to the same page cause the accessed bit to be set).
• The victim load's results are passed to a disclosure gadget that is shortly after the victim load.
• The attacker needs transient execution to last long enough for the disclosure gadget to execute transiently before the paging accessed bit update assist occurs.

When an application's memory pressure increases to help identify candidate pages for swapping, the OS uses the page accessed bit to help identify which pages are least recently used. The OS does this by periodically clearing the accessed bits and reviewing which pages have the accessed bit set (not candidates of page swapping).

There are a number of challenges for an application to influence the memory pressure of another application, and in many cases it is not possible at all. Refer to the Clearing of Accessed or Present bits Attacker causes memory pressure section for more details. Malicious adversaries would need to influence the OS to clear accessed bits on the correct victim process pages. This requires generating memory pressure, which is possible, but generally requires touching a lot of memory and is quite slow and noisy. The exact timing of accessed bit clearing is hard to control.

Even if a malicious adversary is able to clear the accessed bit, any retiring load or store to a page that the malicious adversary selects to cause an assist will trigger a hardware paging table update assist, which will set the accessed bit on that page. This means that the attacker would likely need to clear the accessed bit again (for example, by cycling through the LRU list again), and to repeat that necessary step with each attempt to infer data.

Lastly, the gadget in the victim must leak values via a covert channel (for example, cache) and the attacker will need to infer the values leaked by the covert channel (for example, through monitoring the cache), before system noise (like normal cache traffic) obscures the signal.

Although the usage of MSBDS with paging accessed assists avoids some of the restrictions that limit other techniques, nevertheless lining up all these conditions to successfully execute this method in a non-contrived scenario, with the necessary precision to extract meaningful data, is extremely complex. Accordingly, software developers should carefully evaluate their environment and workloads before choosing to mitigate this method in actual applications.

#### Inside an Application

It is also possible for LVI to be used as an in-domain method. In this situation, untrusted code running within a sandbox could employ the same steps described earlier in the application-to-application section in order to infer data values in the same process.

A sandbox running untrusted code would require the same steps described in the application-to-application section to mount an LVI method against higher privileged code in the same process. To trigger accessed bit assists, the sandboxed application may need to create significant memory pressure. Given that the sandbox is within the same process as the higher privileged code, memory pressure can have a more direct effect on the higher privileged code. However, existing resource limits should help mitigate the issue. Lining up all of the required steps will increase the difficulty of a practical method.

In general, in-domain transient execution attacks are able to leverage the fact that the adversary has more control over code generation and can more easily generate the desired gadgets instead of needing to find them in victim code. This applies to previously disclosed in-domain transient execution attacks like bounds check bypass (Spectre variant 1), branch target injection (Spectre variant 2), and speculative store bypass (Spectre variant 4), as well as to in-domain LVI. For unmitigated runtimes, the risk of in-domain LVI methods is less than the risk of existing in-domain transient execution attacks due to the complexity of LVI methods. Intel has already published technical documentation for managed runtimes, and the primary recommended mitigation discussed there is also effective against in-domain LVI.

## LVI Mitigations for Intel SGX

The threat model for Intel SGX assumes that a malicious OS/hypervisor may arbitrarily manipulate an Intel SGX enclave's page tables. This allows the attacker to cause arbitrary loads to fault or assist during enclave execution.

Because any load may fault or assist, and because it is difficult to determine at compile time whether adversary-desired data may be forwarded by a faulting/assisting load, mitigation techniques may need to consider all possible gadgets, even if many of them might not be exploitable.

The following are summary characterizations of LVI exploits:

1. If the injected value is a secret, transmit that value through a covert channel [Load+Transmit].
2. If the injected value is attacker-controlled, use that value to:
2. Branch to code that can load and transmit a secret [Load+Branch].

This section will describe software mitigation techniques that can be applied to enclaves in order to mitigate LVI attacks against those enclaves. Additionally, updates to the Intel SGX SDK will be released that apply these software mitigations. There is no additional microcode update needed to mitigate LVI (either for Intel SGX or in general).

The Load+Transmit LVI variant requires a faulting/assisting load from memory and a subsequent operation that may transmit the loaded value over a covert channel. For example:

MOV rbx, QWORD PTR [rdi]  # Load
MOV rcx, QWORD PTR [rbx]  # Transmit


If the first load faults/assists, then a stale value may be forwarded to the second load's memory operand. Hence the stale value will be used as an address to access memory, potentially disclosing that value through a covert channel (for example, the last level cache (LLC)).

Note that this is only a valid LVI exploit if the stale value is a program secret.

In general, it is not possible to statically determine whether any given load may forward a secret. Therefore, a comprehensive mitigation strategy must consider all Load+Transmit "gadgets" (even if not all of them are exploitable). For each Load+Transmit gadget, the developer should ensure that at least one LFENCE instruction will be executed in between the load and the transmit, along all viable control flow paths. The LFENCE ensures that if the load faults/assists, then the load will retire before a stale value can be transiently forwarded to the transmit instruction.

MOV rbx, QWORD PTR [rdi]  # Load may fault and inject stale value into rbx
MOV rcx, QWORD PTR [rbx]  # Attacker uses stale value to load secret into rcx
MOV QWORD PTR [rcx], r8   # Transmit secret over cache based covert channel


Notice that the second and third instructions fit the Load+Transmit pattern, and so do the first and second instructions. Hence the Load+Transmit mitigation described in the prior section would yield:

MOV rbx, QWORD PTR [rdi]  # Load
LFENCE                	# Forces prior Load to retire
MOV rcx, QWORD PTR [rbx]  # Load -- rbx guaranteed to be non-stale
LFENCE                	# Forces prior Load to retire
MOV QWORD PTR [rcx], r8   # Store -- rcx guaranteed to be non-stale


MOV rcx, QWORD PTR [rsi]  # Load
JMP rcx                   # Branch/Transmit


If the MOV instruction is used to inject stale data into rcx, then the JMP instruction can be used to either branch to an attacker-chosen instruction sequence, or to transmit the stale data over a covert channel (by fetching instructions into caches from the jump target). The latter case is analogous to Jump Oriented Programming-style methods. Either way, the gadget can be mitigated by inserting an LFENCE after the load.

Depending on the execution properties of the Intel SGX enclave workload (for example, CPU-bound vs. I/O-bound, cache locality, etc.), the performance impact of mitigating all potential Load+Transmit, Load+Load+Transmit, and Load+Branch gadgets will vary depending on workload but may be significant in some cases. If the overhead imposed by mitigating all loads is unacceptable and their particular threat model allows for it, then independent software vendors (ISVs) may also opt to only apply partial mitigations.

### ​ Tooling Support to Automate LVI Mitigation

Intel and industry partners provide toolchain support for compiler and assembler tools that yield object files that satisfy the following property:

For all Load+Transmit gadgets in each procedure/function, every path in the control flow graph from Load to Transmit is "cut" by at least one LFENCE instruction.

In general, it is difficult to analyze assembly code to discover data dependency chains that can form LVI gadgets. Therefore, Intel is making a patched GNU assembler available that trivially achieves the above property by inserting an LFENCE instruction after each instruction that performs a load (the Instructions that Require Special Treatment section discusses instructions requiring special handling). Microsoft* is also releasing an update to the Visual C/C++ compiler with similar capability. The C and C++ languages have semantics that are more amenable to static analysis. To take advantage of this, Intel is collaborating with industry partners to develop an extension to the clang compiler (a part of the LLVM framework) that optimally inserts LFENCE instructions to achieve the property stated above. This optimization approach is further explained in this article

### Instructions that Require Special Treatment

There are several x86 instructions that combine both a load and a dependent memory access or branch. For these instructions, the mitigation is more complicated than simply inserting an LFENCE instruction. The first special case is the handling of function returns (for example, RET instructions.) A compiler can replace all RET instructions with a safe alternative. Specifically, it can identify an available scratch register, and replace each ret with the following:

POP <scratch register>
LFENCE                  # Forces the pop to retire
JMP <scratch register>


This sequence has the same semantics as a RET instruction but is not vulnerable to LVI. Unlike the compiler for C/C++ source code, the assembler is not able to infer liveness for registers, and thus it cannot reliably identify a scratch register. Instead, the assembler replaces each RET instruction with the following sequence:

SHL QWORD PTR [rsp], 0
LFENCE
RET


The rationale behind this sequence is explained in the Elaboration on ad-hoc Load+Branch mitigations section12.

The second exception is related to indirect call and indirect branch instructions with a memory operand. For example:

JMP QWORD PTR [rsi]

For the example above, a compiler can instead generate:

MOV <scratch register> , [rsi]
LFENCE                    # Forces the prior MOV to retire
JMP <scratch register>


If a scratch register is not available, a compiler might instead replace the indirect jump/call from memory with the following instruction sequence that uses a general purpose register (GPR):

XOR QWORD PTR <some GPR> , [rsi]
XOR QWORD PTR <some GPR> , [rsi]
LFENCE
JMP QWORD PTR [rsi]


The XOR instructions do not alter the GPR contents, but do change flags. The compiler should only use this sequence if the changes to flags are acceptable.

Some compilers have options that prevent the compiler from generating indirect calls or branches through memory, which is clearly helpful in mitigating LVI. The Intel SGX SDK takes advantage of this and facilitates Intel SGX developers doing likewise.

Unlike a compiler, an assembler is not able to infer liveness for registers or flags, thus can not use either sequence. If the assembly source code contains indirect calls or branches through memory, manual inspection and modification is required to apply the LVI mitigation, considering whether a scratch register is available or flags can be changed. The updated GNU assembler discussed in the Tooling Support to Automate LVI Mitigation section will output a warning if it encounters indirect calls or branches through memory.

Note that the above mitigations which use indirect JMPs or CALLs are incompatible with retpoline (which replaces all such indirect JMP or CALL instructions with RET instructions).

Retpoline is intended to mitigate branch target injection. Intel SGX-enabled processors with recent microcode updates will enumerate IBRS support and thus already mitigate branch target injection inside enclaves by ensuring that the predicted targets of near indirect branches executed inside an enclave cannot be controlled by software that executes outside the enclave. More details on this are in the guidance on Branch Target Injection.

There are also two REP string instructions that require special treatment. Specifically, the compare string (CMPS) and scan string (SCAS) instructions set EFLAGS in a manner that depends on the data being compared/scanned. Therefore, when used with a REP prefix, the number of iterations may vary depending on this data. If the data is a program secret chosen by the adversary using an LVI method, then this data-dependent behavior may leak some aspect of the secret. The solution is to unfold any REP CMPS and REP SCAS operations into a loop, and insert an LFENCE after the CMPS/SCAS instruction. For example, REPNZ SCAS can be unfolded to:

.RepLoop:
JRCXZ .ExitRepLoop # or JECXZ (see next line)
DEC rcx  # or ecx if the REPNZ SCAS uses a 32-bit address size
SCAS
LFENCE
JNZ .RepLoop
.ExitRepLoop:
...


## Applying Mitigation for Intel SGX

For Intel SGX, enclave developers should evaluate the risk of potential LVI attack and performance implication of the mitigation, and decide whether to apply mitigations to their enclaves. For LVI-affected processors, the Intel SGX Attestation Service will report a new status code, SW_HARDENING_NEEDED, to indicate the platform is affected by a security advisory for which software hardening is recommended.

### Applying Mitigations to Enclaves

The Intel SGX SDK will support building enclaves with different levels of software hardening against the potential LVI attack. In particular:

• No-Auto-Mitigation: No compiler/assembler-inserted LFENCE instructions in developers' code, nor in the linked enclave libraries provided by the SDK. Developers can manually modify the code to apply LFENCE protection.
• Control-Flow-Mitigation: Compiler/assembler configuration that replaces RET and indirect CALL/JMP instructions with an LFENCE-protected instruction stream in developers' C/C++/assembly source code and linker configuration that selects the set of SDK-provided enclave libraries with the same mitigation.
• All-Loads-Mitigation: Compiler/assembler configuration that inserts an LFENCE instruction after each instruction that performs a load and replaces RET and indirect CALL/JMP instructions with an LFENCE-protected instruction stream in developers' C/C++/assembly source code, and linker configuration that selects the set of SDK-provided enclave libraries with the same mitigation.

Both the Control-Flow-Mitigation and the All-Loads-Mitigation options have performance impacts that vary depending on the specific enclave code over which the mitigation is applied. The effect of the mitigations will vary by workload and in some cases may be significant, especially for the All-Loads-Mitigation. As the Intel SGX application includes both enclave code and non-enclave code and the mitigation is only applicable to the enclave code, the overall overhead at the Intel SGX application level is determined not only by the mitigation overhead introduced to the enclave, but also by the amount of time the code executes inside the enclave compared to execution outside of the enclave before the mitigation is applied.

Developers who choose to mitigate LVI can use LVI mitigation-enabled compilers and assemblers discussed in the Tooling Support to Automate LVI Mitigation section to apply the selected level of mitigation to their C/C++ and assembly source code for the enclave. The Intel SGX SDK simplifies this by letting developers choose the mitigation level rather than requiring developers to understand the tools' specific command line options. The SDK documentation has been updated to reflect these changes. It is worth noting that any library binary that is not recompiled or reassembled using the tool chain and configuration recommended by the Intel SGX SDK might not include the desired mitigations; neither will any dynamically generated code within the enclave at enclave runtime, if supported by the enclave.

Developers using third party SGX SDKs should consult their SGX SDK provider for mitigation plans and release timelines.

### When to Apply Mitigation

Enclave developers who want to support Intel SGX-enabled platforms should determine the level of software hardening that their environment requires, based on risk analysis and an evaluation of the performance impacts of mitigation.

If none of the supported platforms are affected by LVI, including LVI zero data, no additional action is required. If only some of the supported platforms are affected by LVI, developers could choose to release one version of the enclave with the selected level of mitigations enabled for all platforms. Alternatively, developers could release multiple versions of the enclave, with one version for platforms that are not affected by LVI which does not include mitigations, and another version which does include mitigations for platforms that are affected by LVI. As in all usages of Intel SGX that utilize Intel SGX remote attestation, developers should provide the identities of the enclaves (MRSIGNERISVPRODIDISVSVN and other relevant fields in the enclave SIGSTRUCT) to the relying parties (verifier of the Intel SGX remote attestation data) so the relying parties can determine which enclave or version of an enclave they are communicating with.

Developers who choose to support multiple versions of enclaves should sign the enclaves to identify which enclaves include software mitigations against LVI. Furthermore, data sealed by an enclave that includes software hardening should not be unsealable by alternative versions of the enclave that do not include software hardening. One way to achieve this is to assign a higher enclave ISVSVN value to the enclave version with software hardening than you do to the enclave version without software hardening.

Intel has not been able to identify and successfully exploit any Load+Transmit or Load+Load+Transmit code gadgets inside the Intel enclaves involved in Intel SGX remote attestation. Nonetheless, out of an abundance of caution, Intel will release updates to those enclaves with All-Loads-Mitigation applied and conduct an Intel SGX trusted computing base (TCB) Recovery event to enable relying parties to tell whether the updated Intel SGX attestation enclaves were utilized.

### Relying Parties

Intel SGX Attestation services will indicate whether the platform the attestation request originated from is affected by LVI (LVI-stale-data and/or LVI zero data), through a new status code, SW_HARDENING_NEEDED. A platform with the required version of microcode and Intel SGX attestation software stack, that is properly configured according to the relevant Intel SGX security advisories (for example, INTEL-SA-00233 and INTEL-SA-00219), will receive one of the two following status codes:

• OK or UP-TO-DATE: The platform is not affected by LVI
• SW_HARDENING_NEEDED: The platform is affected by LVI.

The relying party should evaluate the potential risk of an attack on platforms affected by LVI and whether the attesting enclave employs adequate software hardening to mitigate the risk, which is reflected in the enclave identity (MRSIGNERISVPRODIDISVSVN and other relevant fields in the attestation data). The relying party might reject attestations from enclaves without appropriate LVI mitigations.

## LVI Mitigation for Non-SGX Environments

Because malicious adversaries have limited ability to influence the paging behavior of victim processes to cause faults or assists in non-SGX environments, LVI is not a practical exploit in real-world non-SGX environments. Developers can mitigate potentially vulnerable code by inserting additional LFENCE instructions to block speculative activity or techniques like array index masking to prevent leaking data via covert channels. But because of the complexity of lining up all these conditions to successfully execute this method in a non-contrived scenario, software developers should carefully evaluate their environments and workloads before choosing to mitigate this.

## Enumeration

An OS or VMM can discover their potential susceptibility to LVI stale data by determining whether the processor is affected by L1TF, MDS, TAA. In particular, processors with the combination of the three following properties are not affected by LVI stale data:

• Enumerates RDCL_NO
• Enumerates MDS_NO
• Either enumerates TAA_NO, or does not support Intel TSX, or has disabled TSX/RTM using IA32_TSX_CTRL

On processors that are affected by TAA but not by MDS, software that does not use loads within an Intel TSX region cannot be impacted by LVI stale data.

Intel SGX usage may need an alternative mechanism to detect whether the CPU is affected by LVI, as it does not trust the OS. Through Intel SGX remote attestation, a relying party can examine the remote attestation evaluation status code and tell whether the remote attestation request is from a platform affected by LVI (LVI stale data and/or LVI zero data). Refer to the Relying Parties section for details.

## Appendix

The effect of the SHL instruction is to assert that the stack pointer refers to a valid page, without changing the contents of memory or clobbering any registers (including flags). The LFENCE then ensures that the SHL retires before finally issuing the RET. This ensures that instructions dependent on the RET will not transiently execute if the SHL instruction signals a fault/assist. Enclave entry will clear out buffers affected by MDS or L1TF, and thus an attacker cannot inject non-zero data to the RET if they enter the enclave between the SHL and RET instructions.

The soundness of the SHL+LFENCE+RET sequence should not only depend on the length of the transient window. For (non-NULL) LVI, microarchitectural buffers and the data cache unit are cleared at the end of ERESUME, so if the ERESUME hits after the SHL, then malicious actors would have nothing to inject. For LVI zero data, it is possible to inject 0 as the return address, which will cause transient execution to jump to LIP 0. If LIP 0 is (architecturally) outside of the enclave, the RET instruction will fault and AEX will be delivered. The microarchitecture will not allow instructions to be transiently fetched and executed in this case. If LIP 0 is (architecturally) inside of the enclave and LIP 0 is executable with valid instructions at LIP 0, then these instructions may be transiently executed.

Note however that the Intel SGX SDK will not build enclaves with instructions or execute permission at the beginning of the enclave.

For other SGX SDKs, a trivial mitigation is to similarly build these enclaves such that they do not have instructions at the beginning of the address space. In summary, even if a malicious OS maps the beginning of the enclave at LIP 0, there will be no executable instructions at LIP 0.

### Elaboration on Control-Flow-Mitigation Effectiveness on LVI Zero Data Stack Hijacking Attack

An LVI zero data attack can be used to hijack the Intel SGX enclave stack pointer during transient execution. In the following example, on a processor affected by LVI zero data, the attack is able to cause the load from 0x58(%rsp) to fault and the dependent instructions in the code gadget to forward value 0 to the rsp register. At that point, all subsequent POP and RET instructions dereference the malicious memory page mapped at virtual address 0, outside of the enclave. By filling the memory at virtual address 0 with specifically crafted content, the attacker is able to cause the transient execution to branch to any desired code gadget within the enclave. The attacker might also be able to mount a transient ROP attack by chaining together multiple subsequent POP-RET instructions.

MOV rbp, QWORD PTR [rsp+58h]       # Fault, rbp <- 0
...
MOV rsp, rbp               # rsp <- 0
POP rbp                    # rbp <- *(0)
RET                        # rip <- *(0 + 8)


When the Control-Flow-Mitigation, for example, the SHL-LFENCE sequence inserted before the RET instruction, is applied to the code above, the LFENCE before the RET instruction ensures that the Load fault will retire before the next instruction can execute. As a result, the fault will be signaled and transient execution cannot reach the RET instruction to branch to the attacker's desired code gadget or any subsequent POP-RET instructions.

MOV rbp, QWORD PTR [rsp+58h]       # Fault, rbp <- 0
...
MOV rsp, rbp               # rsp <- 0
POP rbp                    # rbp <- *(0)
SHL QWORD PTR [rsp], 0     # *(8) <- *(8)
LFENCE                     # Forces prior Load from [rsp+58h] to
# retire. Fault signaled.
RET
`

## Footnotes

1. Assists are conditions that are handled internally by the processor and thus do not require software involvement. While both faults and assists may cause the results of a μop to be discarded, assists restart and complete the instruction without needing software involvement, whereas faults do need software involvement (for example, an exception handler). For example, setting the Dirty bit in a page table entry may be done using an assist.
2. Some processors may transiently receive incorrect data from the store buffer, load ports, fill buffer, or L1D cache. For more details, please see Intel Analysis of Microarchitectural Data Sampling and Intel Analysis of L1 Terminal Fault.
3. Refer to Refined Speculative Execution Terminology.
4. Detailed in Steps and elements for attackers to cause LVI section.
5. A list of microarchitectural structures that could potentially be primed with malicious data is given in Speculative Microarchitectural Data Sources section.
6. Many pieces of software do not deal with data from malicious applications (for example because they do not directly take input data from sources which may be malicious) and may be more difficult or impossible for the attacker to prime data.
7. The attacker's value is injected into the transient instruction stream via the load, hence the term load value injection.
8. A list of microarchitectural structures that could potentially be primed with malicious data is given in Speculative Microarchitectural Data Sources section.
9. For this reason, parts generally exhibiting Zero-at-ret behavior and not LVI-stale-data or LVI zero data (with the possible exception of masked loads) will be documented as "not affected".
10. SGX-Step: A Practical Attack Framework for Precise Enclave Execution Control.
11. LVI zero data can normally only attack the lower pages, which is not an useful method. OS protect the low parts of the address space as part of other mitigations.
12. This sequence had previously been described as NOT-NOT-LFENCE-RET. The SHL-LFENCE-RET sequence has the same security properties, but requires one less instruction.

#### Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.