Hardware Features and Behaviors Related to Speculative Execution

Key Takeaways

Modern processors make predictions about the program’s future execution to improve performance. Processors implement various forms of predictions and speculation which may result in instructions being speculatively executed. If a prediction was wrong, the instructions which were speculatively executed based on the misprediction must be squashed and do not affect architectural states. However, a malicious actor may be able to use mispredictions to perform transient execution attacks.
This article consolidates prior guidance on speculative execution and brings together relevant information to help readers navigate this topic. This consolidated document explains how to effectively address speculation in Intel processors for secure code execution, limit the performance impact of mitigations, and avoid mitigation redundancies.
Intel plans to update this document periodically to incorporate new guidance documents as they are released; for example, reflecting speculation control mechanisms that may be added on future Intel products.

Modern processors use speculative execution to provide higher performance, more efficient resource utilization, and better user experiences. The speculation mechanisms may use various forms of predictors to anticipate future program execution and improve performance by having instructions execute earlier than their program order. While these predictors are designed to have high accuracy, wrong predictions can occur and result in mis-speculation, where a processor first executes instructions based on a prediction, and later squashes them to return to correct program execution. An attacker can potentially exploit such mis-speculation to reveal sensitive data in a transient execution attack.

While previous documentation described specific speculative execution vulnerabilities and their mitigations, this article consolidates the prior guidance on speculative execution with better organization. It continues to refer to the per-vulnerability guidance documents for more details. The first version of this article does not change existing guidance for transient execution attacks over what has previously been published, but rather brings together relevant information to help readers navigate this topic. Later versions of this article may include additional guidance.

This consolidated document explains how to effectively address speculation in Intel processors for secure code execution, limit the performance impact of mitigations, and avoid mitigation redundancies for the features and behaviors included. It also provides an overview of the different types of speculation on current Intel processors and describes the hardware controls and software-based techniques that developers can use to restrict speculation and reduce the ability of potential adversaries to infer secret data due to speculation. Intel plans to update this article periodically to incorporate new guidance documents as they are released; for example, reflecting speculation mechanisms that may be added on future Intel processors.

This document is organized as follows: the Speculative Execution section starts with an introduction to speculative execution, describes control-flow and data speculation, and outlines options to restrict speculation. The Control-Flow Speculation section details control-flow speculation due to indirect and conditional branches and techniques to restrict control-flow speculation on Intel processors. The Data Speculation section describes variants of data speculation such as memory disambiguation and options to manage data speculation. The Data-Dependent Prefetchers section outlines data-dependent prefetches. The Additional Software Guidance section summarizes the recommendations for restricting speculation in common use cases, such as after a processor enters a higher privilege level. The Related Intel Security Features and Technologies section describes security features and technologies which reduce the effectiveness of malicious attacks described in the previous sections. Finally, the CPUID Enumerations and Architectural MSRs section describes the processor enumerations and model-specific registers that provide the hardware features and mechanisms described in this article.

Speculative Execution

In order to improve performance, modern processors make predictions about the program’s future execution. Processors use these predictions to speculatively execute younger instructions ahead of the current instruction pointer. As the processor advances in program execution, it resolves all conditions required to determine the correctness of the prediction. If the original predictions were correct, the speculatively executed instructions can retire, and their state becomes architecturally visible. If a prediction was wrong, the instructions which were speculatively executed based on the misprediction must be squashed and do not affect architectural states. These squashed instructions, which were only executed speculatively, are called transient instructions. Based on the resolved conditions, the processor then resumes with the correct program execution. A more detailed description of speculative execution is available in the Refined Speculation Execution Terminology article.

Processors implement various forms of predictions and speculation which may result in instructions being speculatively executed, including:

Control-flow speculation involves speculatively executing instructions based on a prediction of the program’s control flow.
- Indirect branch predictors predict the target address of indirect branch instructions¹ to allow instructions at the predicted target address to be speculatively executed before the target address has been resolved.
- Conditional branch predictors predict the direction of conditional branches to allow instructions on the predicted path to be speculatively executed before the condition has been resolved.
Data speculation involves speculatively executing instructions which depend on the values from previous instructions before the previous instructions have been executed. For example, the processor may speculatively forward data from a previous load to younger dependent instructions before the addresses of all intervening stores are known.

While speculative execution predictors strive to have high accuracy, predictions can be wrong. A malicious actor may be able to use mispredictions to perform transient execution attacks, in which case a malicious actor may attempt to retrieve secret information from transiently executed instructions through an incidental channel.

Multiple sources of speculation may affect the same instruction. For example, an indirect branch may be affected by both control-flow speculation and data speculation. Control-flow speculation may cause the indirect branch to be predicted with a target based on past behavior. If a malicious actor controlled the predicted branch target, this would be called attacker-controlled prediction. Data speculation could later affect the indirect branch’s source data and cause it to transiently go to an incorrectly predicted location before later redirecting to the correct location. If a malicious actor controlled the branch target through data speculation, this would be called attacker-controlled jump redirection.

The following sections discuss the various types of speculation as well as the configurations Intel processors provide to control speculation.

Incidental Channels

There are several sources of incidental channels that may be used to retrieve information from transiently executed instructions. An overview of possible incidental channels is provided in the incidental channel taxonomy.

Using such incidental channels, a malicious actor may be able to gain information through observing certain states of the system, such as by measuring the microarchitectural properties of the system. Unlike buffer overflows and other vulnerability classes, incidental channels do not directly influence the execution of the program, nor allow data to be modified or deleted.

For instance, a cache timing side channel involves an agent detecting whether a piece of data is present in any or a specific level of the processor’s caches, which may be used to infer some other related information. One common method to detect whether the data of interest is present in a cache is to use timers to measure the latency to access memory at the corresponding address and compare with the baseline timing of memory accesses that hit the cache or memory.

Restricting Speculative Execution

System operators have a range of options available to restrict speculation in Intel processors and reduce the risk of transient execution attacks. Intel processors provide several controls, such as enhanced Indirect Branch Restricted Speculation (IBRS) and Speculative Store Bypass Disable (SSBD), to restrict control speculation of indirect branches and to control data speculation, respectively. The Indirect Branch Speculation Control Mechanisms section details the indirect branch speculation controls available and their usage and the Speculative Store Bypass Control Mechanisms section describes controls to restrict data speculation.

Speculation can also be restricted through software-based techniques: For example, software can use a technique called retpoline (see the Software Techniques for Indirect Speculation Control section) to restrict indirect branch speculation and use bounds clipping to prevent speculative out-of-bounds array accesses following conditional branches (refer to the Overview of Bounds Check Bypass section).

More generally, software can insert speculation-stopping barriers at the proper locations as needed to prevent a speculative side channel. The LFENCE instruction, or any serializing instruction, can serve as such a barrier. The LFENCE instruction and serializing instructions ensure that no later instruction will execute, even speculatively, until all prior instructions have completed locally. The LFENCE instruction has lower latency than the serializing instructions and thus is recommended when a speculation-stopping barrier is needed.

Certain security features with architectural effect can also be effective with respect to speculative execution. For example, when Supervisor Mode Access Prevention (SMAP) is enabled, supervisor loads executed with a cleared AC flag will not transiently access memory in user mode pages from CPL0. This may prevent an attacker from using user memory for an incidental channel.

Control-Flow Speculation

As highlighted in the Speculative Execution section, control-flow speculation occurs when the processor speculatively executes instructions based on control flow prediction. The two main sources of transient execution related to control-flow speculation on Intel processors are indirect branches and conditional branches.

Besides control-flow speculation from branch predictions, there is also implicit sequential control-flow speculation due to out-of-order execution: instructions can be speculatively executed on a sequential control-flow path ahead of the architecturally committed instruction pointer. In case of architectural or microarchitectural events (for example, exceptions or assists), instructions on the sequential path following the event may be transiently executed and squashed later by the processor as part of the event handling mechanism. This is a common behavior in modern processors with out-of-order execution and not considered a security issue by itself. However, when combined with specific vulnerabilities such as Rogue Data Cache Load, Rogue System Register Read, L1 Terminal Fault and Lazy FP, malicious actors may be able to leverage speculative execution to bypass existing security restrictions and infer secret data on some processors. This paper does not discuss the behavior of those specific vulnerabilities; refer to the respective technical papers for more details.

The following section of this article describes control-flow speculation due to indirect branches and conditional branches, as well as the hardware and software mechanisms that can be used to restrict such control-flow speculation.

Indirect Branches

Overview of Indirect Branch Predictors

Intel processors use indirect branch predictors to determine the target address of instructions that are to be speculatively executed after a near indirect branch instruction, as enumerated in the table below.

Table 1: Instructions that use Indirect Branch Predictors
Branch Type	Instruction	Opcode
Near Call Indirect	CALL r/m16, CALL r/m32, CALL r/m64	FF /2
Near Jump Indirect	JMP r/m16, JMP r/m32, JMP r/m64	FF /4
Near Return	RET, RET Imm16	C3, C2 Iw

References in this document to indirect branches are only to near call indirect, near jump indirect and near return instructions.

To make accurate predictions, indirect branch predictors are trained through program execution. Specifically, indirect branch predictors learn the target addresses of indirect branch instructions when they execute and use them for target prediction of subsequent execution of indirect branch instructions. While being accurate for most cases, misprediction may happen and the indirect branch predictor may predict the wrong target address, which can result in instructions at an incorrect code location being speculatively executed and later squashed.

Intel processors implement different forms of indirect branch predictors, such as:

Branch Target Buffer (BTB) predicts indirect branch target address based on the branch instruction’s address.
Other branch predictors predict indirect branch target address based on the history of previously executed branch instructions. This allows the processor to predict different targets for the same indirect branch depending upon the previous code leading up to the indirect branch. The Branch History Buffer (BHB) holds the history which is used to select branch targets in these predictors.
The Return Stack Buffer (RSB) is a microarchitectural structure that predicts the targets of near RET instructions based on previous corresponding CALL instructions. Each execution of a near CALL instruction with a non-zero displacement adds an entry to the RSB that contains the address of the instruction sequentially following that CALL instruction. The RSB is not used or updated by far CALL, far RET, or IRET instructions.

Note that besides control-flow speculation, such as in indirect branch predictions, data speculation can also be the origin of speculative execution in the context of indirect branch instructions. For instance, due to memory disambiguation, an indirect jump instruction may load the target address from a memory location and speculatively jump to this target address before an older store instruction has stored a different target address to that memory location².

Branch Target Injection (BTI), Branch History Injection (BHI), and Intra-mode BTI are all microarchitectural transient execution attack techniques which involve an adversary influencing the target of an indirect branch by training the indirect branch predictors. Intel processors support indirect branch speculation control mechanisms which can be used to mitigate such attacks.

Indirect Branch Prediction and Intel® Hyper-Threading Technology (Intel® HT Technology)

In a processor supporting Intel® Hyper-Threading Technology, a core (or physical processor) may include multiple logical processors. On such processors, the logical processors sharing a core may share indirect branch predictors. As a result of this sharing, software on one of a core’s logical processors may be able to control the predicted target of an indirect branch executed on another logical processor on the same core.

This sharing occurs only within a core. Software executing on a logical processor of one core cannot control the predicted target of an indirect branch by a logical processor of a different core.

This sharing also occurs only when STIBP is not enabled and only on processors without support for enhanced IBRS.

Indirect Branch Speculation Control Mechanisms

Intel has developed indirect branch predictor controls, which are interfaces between the processor and system software to manage the state of indirect branch predictors.

All supported Intel processors provide three indirect branch control mechanisms:

Indirect Branch Restricted Speculation (IBRS): Restricts indirect branch predictions, which can be used by virtual machine manager (VMM) or operating system code to prevent the use of predictions from another security domain. Recent processors support enhanced IBRS, which can be enabled once and never disabled (always on mode).
Single Thread Indirect Branch Predictors (STIBP): Prevents indirect branch predictions from being controlled by a sibling hyperthread. Processors which support enhanced IBRS always have this behavior, regardless of the setting of STIBP.
Indirect Branch Predictor Barrier (IBPB): Prevents indirect branch predictions after the barrier from being controlled by software executed before the barrier. IBPB also acts as a barrier for the Fast Store Forwarding Predictor and Data Dependent Prefetchers (refer to the Overview of Data Speculation section), where relevant. This allows VMM and operating system code to provide isolation when switching between guests or userspace applications which execute in different security domains.

Some recent Intel processors also support additional indirect branch control mechanisms which focus on specific indirect branch predictors or behaviors. Some examples include the IPRED_DIS_U, IPRED_DIS_S, RRSBA_DIS_U, RRSBA_DIS_S, and BHI_DIS_S bits in the IA32_SPEC_CTRL MSR.

System software can use these indirect branch control mechanisms to defend against branch target injection attacks.

Predictor Mode

Intel processors support different modes of operation corresponding to different levels of privilege. VMX root operation (for a virtual-machine monitor, or host) is more privileged than VMX non-root operation (for a virtual machine, or guest). Within either VMX root operation or VMX non-root operation, supervisor mode (CPL < 3) is more privileged than user mode (CPL= 3).

To prevent inter-mode attacks based on branch target injection, it is important to ensure that less privileged software cannot control the branch target prediction in more privileged software. For this reason, it is useful to introduce the concept of predictor mode associated with different modes of operation as mentioned above. There are four predictor modes: host-supervisor, host-user, guest-supervisor, and guest-user.

The guest predictor modes are considered less privileged than the host predictor modes. Similarly, the user predictor modes are considered less privileged than the supervisor predictor modes.

There are operations that may be used to transition between unrelated software components but do not change CPL or cause a VMX transition. These operations do not change predictor mode. Examples include MOV to CR3, VMPTRLD, EPTP switching (using VM function 0), and GETSEC[SENTER].

Indirect Branch Restricted Speculation (IBRS)

Indirect branch restricted speculation (IBRS) is an indirect branch control mechanism that restricts speculation of indirect branches. A processor supports IBRS if it enumerates CPUID.(EAX=7H,ECX=0):EDX[26] as 1.

IBRS: Basic Support

Processors that support IBRS provide the following guarantees without any enabling by software:

The predicted targets of near indirect branches executed in an enclave (a protected container defined by Intel® SGX) cannot be controlled by software executing outside the enclave.
If the default treatment of system-management interrupts (SMIs) and system management mode SMM is active, software executed before a SMI cannot control the predicted targets of indirect branches executed in SMM after the SMI.
The predicted targets of near indirect branches executed inside a Trust Domain (TD), a virtual machine managed by Intel® Trust Domain Extensions (Intel® TDX) module, cannot be controlled by software executing outside the TD.

IBRS: Support Based on Software Enabling

IBRS provides a method for critical software to protect their indirect branch predictions.

If software sets IA32_SPEC_CTRL.IBRS to 1 after a transition to a more privileged predictor mode, predicted targets of indirect branches executed in that predictor mode with IA32_SPEC_CTRL.IBRS = 1 cannot be controlled by software that was executed in a less privileged predictor mode³. Additionally, when IA32_SPEC_CTRL.IBRS is set to 1 on any logical processors of that core, the predicted targets of indirect branches cannot be controlled by software that executes (or has executed previously) on another logical processor of the same core. Therefore, it is not necessary to set bit 1 (STIBP) of the IA32_SPEC_CTRL MSR when IBRS is set to 1.

If IA32_SPEC_CTRL.IBRS is already 1 before a transition to a more privileged predictor mode, some processors may allow the predicted targets of indirect branches executed in that predictor mode to be controlled by software that executed before the transition. Software can avoid this by using WRMSR on the IA32_SPEC_CTRL MSR to set the IBRS bit to 1 after any such transition, regardless of the bit’s previous value. It is not necessary to clear the bit first; writing it with a value of 1 after the transition suffices, regardless of the bit’s original value.

Setting IA32_SPEC_CTRL.IBRS to 1 does not suffice to prevent the predicted target of a near return from using an RSB entry created in a less privileged predictor mode. Software can avoid this by using an RSB overwrite sequence ⁴ following a transition to a more privileged predictor mode. It is not necessary to use such a sequence following a transition from user mode to supervisor mode if supervisor-mode execution prevention (SMEP) is enabled. SMEP prevents execution of code on user mode pages, even speculatively, when in supervisor mode. User mode code can only insert its own return addresses into the RSB, not the return addresses of targets on supervisor mode code pages. On processors without SMEP where separate page tables are used for the OS and applications, the OS page tables can map user code as no-execute. The processor will not speculatively execute instructions from a translation marked no-execute.

Enabling IBRS does not prevent software from controlling the predicted targets of indirect branches of unrelated software executed later at the same predictor mode (for example, between two different user applications, or two different virtual machines). Such isolation can be ensured through use of IBPB, described in the Indirect Branch Predictor Barrier (IBPB) section.

Enabling IBRS on one logical processor of a core with Intel HT Technology may affect branch prediction on other logical processors of the same core. For this reason, software should disable IBRS (by clearing IA32_SPEC_CTRL.IBRS) prior to entering a sleep state (for example, by executing HLT or MWAIT) and re-enable IBRS upon wakeup and prior to executing any indirect branch.

Enhanced IBRS

Some processors may enhance IBRS by simplifying software enabling and improving performance. A processor supports enhanced IBRS if RDMSR returns a value of 1 for bit 1 of the IA32_ARCH_CAPABILITIES MSR.

Enhanced IBRS supports an always on model in which IBRS is enabled once (by setting IA32_SPEC_CTRL.IBRS) and never disabled. If IA32_SPEC_CTRL.IBRS = 1 on a processor with enhanced IBRS, the predicted targets of indirect branches executed cannot be controlled by software executed in a less privileged predictor mode or on another logical processor.

As a result, software operating on a processor with enhanced IBRS need not use WRMSR to set IA32_SPEC_CTRL.IBRS after every transition to a more privileged predictor mode. Software can isolate predictor modes effectively simply by setting the bit once. Software need not disable enhanced IBRS prior to entering a sleep state such as MWAIT or HLT.

On processors with enhanced IBRS, an RSB overwrite sequence may not suffice to prevent the predicted target of a near return from using an RSB entry created in a less privileged predictor mode. Software can prevent this by enabling SMEP (for transitions from user mode to supervisor mode) and by having IA32_SPEC_CTRL.IBRS set during VM exits. Processors with enhanced IBRS still support the usage model where IBRS is set only in the OS/VMM for OSes that enable SMEP. To do this, such processors will manage guest behavior such that it cannot control the RSB after a VM exit once IBRS is set, even if IBRS was not set at the time of the VM exit. If the guest has cleared IBRS, the hypervisor should set IBRS after the VM exit, just as it would do on processors supporting IBRS but not enhanced IBRS. As with IBRS, enhanced IBRS does not prevent software from affecting the predicted target of an indirect branch executed at the same predictor mode. For such cases, software should use the IBPB command, described in the Indirect Branch Predictor Barrier (IBPB) section.

On processors with enhanced IBRS support, Intel recommends that IBRS be set to 1 and left set. The traditional IBRS model of setting IBRS only during ring 0 execution is just as secure on processors with enhanced IBRS support as it is on processors with basic IBRS, but the WRMSRs on ring transitions and/or VM exit/entry will cost performance compared to just leaving IBRS set. Again, there is no need to use STIBP when IBRS is set. However, IBPB should still be used when switching to a different application/guest that does not trust the last application/guest that ran on a particular hardware thread.

Guests in a VM migration pool that includes hardware without enhanced IBRS may not have IA32_ARCH_CAPABILITIES.IBRS_ALL (enhanced IBRS) enumerated to them, and thus may use the traditional IBRS usage model of setting IBRS only in ring 0. For performance reasons, once a guest has been shown to frequently write IA32_SPEC_CTRL, we do not recommend that the VMM cause a VM exit on such WRMSRs. The VMM running on processors that support enhanced IBRS should allow the IA32_SPEC_CTRL-writing guest to control guest IA32_SPEC_CTRL. The VMM should thus set IBRS after VM exits from such guests to protect itself (or use alternative techniques like retpoline, secret removal, or indirect branch removal).

On processors without enhanced IBRS, Intel recommends using retpoline or setting IBRS only during ring 0 and VMM modes. IBPB should be used when switching to a different process/guest that does not trust the last process/guest that ran on a particular hardware thread. For performance reasons, IBRS should not be left set during application execution.

Single Thread Indirect Branch Predictors (STIBP)

As noted in the Indirect Branch Prediction and Intel® Hyper-Threading Technology (Intel® HT Technology) section, the logical processors sharing a core may share indirect branch predictors, allowing one logical processor to control the predicted targets of indirect branches by another logical processor of the same core.

Single thread indirect branch predictors (STIBP) is an indirect branch control mechanism that restricts the sharing of indirect branch prediction between logical processors on a core. A processor supports STIBP if it enumerates CPUID.(EAX=7H,ECX=0):EDX[27] as 1. Setting bit 1 (STIBP) of the IA32_SPEC_CTRL MSR on a logical processor prevents the predicted targets of indirect branches on any logical processor of that core from being controlled by software that executes (or executed previously) on another logical processor of the same core.

Unlike IBRS and IBPB, STIBP does not affect all branch predictors that contain indirect branch predictions. STIBP only affects those branch predictors where software on one hardware thread can create a prediction that can then be used by the other hardware thread for indirect branches. This is part of what makes STIBP have lower performance overhead than IBRS on some implementations.

It is not necessary to use IBPB after setting STIBP in order to make the STIBP effective. STIPB provides isolation of indirect branch prediction between logical processors on the same core when and only when it is set, it is not a branch prediction barrier between the execution before and after it being set, on the same logical processor or on logical processors of the same core.

Processes that are particularly security-sensitive may wish to have STIBP be set when they execute to prevent their indirect branch predictions from being controlled by another hardware thread on the same physical core. On some older Intel Core-family processors, this comes at significant performance cost to both hardware threads due to disabling some indirect branch predictors (as described earlier). Because of this, we do not recommend that STIBP be set during all application execution on processors that only support basic IBRS.

Indirect branch predictors are never shared across cores. Thus, the predicted target of an indirect branch executed on one core can never be affected by software operating on a different core. It is not necessary to set IA32_SPEC_CTRL.STIBP to isolate indirect branch predictions from software operating on other cores.

Many processors do not allow the predicted targets of indirect branches to be controlled by software operating on another logical processor, regardless of STIBP. These include processors on which Intel Hyper-Threading Technology is not enabled and those that do not share indirect branch predictor entries between logical processors. To simplify software enabling and enhance workload migration, STIBP may be enumerated (and setting IA32_SPEC_CTRL.STIBP allowed) on such processors.

A processor may enumerate support for the IA32_SPEC_CTRL MSR (e.g., by enumerating CPUID.(EAX=7H,ECX=0):EDX[26] as 1) but not for STIBP (CPUID.(EAX=7H,ECX=0):EDX[27] is enumerated as 0). On such processors, execution of WRMSR to IA32_SPEC_CTRL ignores the value of bit 1 (STIBP) and does not cause a general-protection exception (#GP) if bit 1 of the source operand is set. It is expected that this fact will simplify virtualization in some cases.

As noted in the Indirect Branch Restricted Speculation (IBRS) section, enabling IBRS prevents software operating on one logical processor from controlling the predicted targets of indirect branches executed on another logical processor. For that reason, it is not necessary to enable STIBP when IBRS is enabled.

Recent Intel processors, including all processors which support enhanced IBRS, provide this isolation for indirect branch predictions between logical processors without the need to set STIBP.

Enabling STIBP on one logical processor of a core with Intel Hyper-Threading Technology may affect branch prediction on other logical processors of the same core. For this reason, software should disable STIBP (by clearing IA32_SPEC_CTRL.STIBP) prior to entering a sleep state (for example., by executing HLT or MWAIT) and re-enable STIBP upon wakeup and prior to executing any indirect branch.

Indirect Branch Predictor Barrier (IBPB)

The indirect branch predictor barrier (IBPB) is an indirect branch control mechanism that establishes a barrier, preventing software that executed before the barrier from controlling the predicted targets of indirect branches⁵ executed after the barrier on the same logical processor. A processor supports IBPB if it enumerates CPUID.(EAX=7H,ECX=0):EDX[26] as 1. IBPB can be used to help mitigate Branch Target Injection.

The IBPB also provides other domain isolation properties regarding speculative execution, such as for the Fast Store Forwarding Predictor and Data Dependent Prefetchers where relevant.

Unlike IBRS and STIBP, IBPB does not define a new mode of processor operation that controls the branch predictors. As a result, it is not enabled by setting a bit in the IA32_SPEC_CTRL MSR. Instead, IBPB is an operation that software executes when necessary.

Software executes an IBPB command by writing the IA32_PRED_CMD MSR to set bit 0 (IBPB). This can be done either using the WRMSR instruction or as part of a VMX transition that loads the MSR from an MSR-load area. Software that executed before the IBPB command cannot control the predicted targets of indirect branches executed after the command on the same logical processor. The IA32_PRED_CMD MSR is write-only, and it is not necessary to clear the IBPB bit before writing it with a value of 1.

IBPB can be used in conjunction with IBRS to account for cases that IBRS does not cover:

As noted in the Indirect Branch Restricted Speculation (IBRS) section, IBRS does not prevent software from controlling the predicted target of an indirect branch of unrelated software (for example, a different user application or a different virtual machine) executed at the same predictor mode. Software can aim to prevent such control by executing an IBPB command when changing the identity of software operating at a particular predictor mode (for example, when changing user applications or virtual machines).
Software may choose to clear IA32_SPEC_CTRL.IBRS in certain situations for example, for execution with CPL = 3 in VMX root operation). In such cases, software can use an IBPB command on certain transitions (for example, after running an untrusted virtual machine) to prevent software that executed earlier from controlling the predicted targets of indirect branches executed subsequently with IBRS disabled.

Note that, on some processors that do not enumerate PBRSB_NO, there is an exception to the IBPB-established barrier for RSB-based predictions. On these processors, a RET instruction that follows VM exit or IBPB without a corresponding CALL instruction may use the linear address following the most recent CALL instruction executed prior to the VM exit or IBPB as the RSB prediction (refer to the Post-barrier Return Stack Buffer Predictions guidance). In these cases, software can use special code sequences (refer to the Return Stack Buffer Control section) to steer RSB predictions to benign code regions that restrict speculation.

Other Indirect Branch Predictor Controls

The BHI_DIS_S indirect predictor control prevents predicted targets of indirect branches executed in CPL0, CPL1, or CPL2 from being selected based on branch history from branches executed in CPL3. While set in the VMX root (host), it also prevents predicted targets executed in CPL0 (ring 0/root) from being selected based on branch history from branches executed in a VMX non-root (guest). It may not prevent predicted targets executed in CPL3 of VMX root from being based on branch history for branches executed in a VMX non-root (guest). Future processors may have the behavior described above for BHI_DIS_S by default; software can determine whether this is the case by checking whether BHI_NO is enumerated by the processor.

The IPRED_DIS_U (affecting CPL3) and IPRED_DIS_S (affecting CPL < 3) controls, when active, prevent transient execution at predicted targets of an indirect near JMP/CALL before the target is resolved . This includes transient execution at past targets of that same branch. Transient execution at predicted targets of a near RET prediction will only occur for RSB-based return predictions, or for linear address 0. Note that, as previously documented, fall-through speculation to instruction bytes following an indirect JMP/CALL or speculation to linear address 0 may still occur.

When the RRSBA_DIS_S (affecting CPL < 3) and RRSBA_DIS_U (affecting CPL3) indirect predictor controls are set, transient execution at predicted targets of a near RET prediction will only occur for RSB-based return predictions, or for linear address 0.

Software Techniques for Indirect Speculation Control

Besides the hardware-based mechanisms described above, software mechanisms can also be used to limit indirect branch speculation.

For example, indirect branch prediction can be suppressed in some cases by using a software-based approach called retpoline, which was developed by Google*. Details of retpoline are described in Retpoline: A Branch Target Injection Mitigation.

Return Stack Buffer Control

Some software techniques to control speculation, such as retpoline, require return address speculation to have predictable behavior to work properly. Under some circumstances, for example, a deep call stack or imbalanced CALL and RET instructions, the RSB may underflow and alternative predictors are used to predict the return address of RET instructions.

RSB stuffing is a software technique to fill the RSB with trusted-software-controlled return targets to avoid RSB underflow. On older processors, RSB stuffing with 32 return targets is sufficient. On these processors, RSB stuffing may be used in conjunction with retpoline to restrict return address mis-speculation to controlled targets. On processors without enhanced IBRS, RSB stuffing may also be used by a VMM after VM exit to protect against RSB underflow.

Branch History Buffer Control

To address Branch History Injection, software can use a code sequence to control speculation that arises from collisions in the Branch History Buffer (BHB). This code sequence overwrites the branch history after domain transitions to prevent the previous domain from influencing BHB-based indirect branch prediction in the current domain. As microarchitectural details of the BHB may change in future processors, Intel recommends using hardware-based controls, such as BHI_DIS_S, where available.

Conditional Branches

Intel processors use conditional branch predictors to predict the direction of conditional branch instructions before their actual execution. This allows the processor to fetch and speculatively execute instructions on the predicted execution path after the conditional branch. Speculative execution side channels (aka Transient Execution Attacks) that are based around conditional branch prediction are classified as Spectre Variant 1.

Overview of Bounds Check Bypass

Bounds check bypass is a side channel method that takes advantage of the speculative execution that may occur following a conditional branch instruction. Specifically, the method is used in situations in which the processor is checking whether an input is in bounds (for example, while checking whether the index of an array element being read is within acceptable values). The processor may issue operations speculatively before the bounds check resolves. If the attacker contrives for these operations to access out-of-bounds memory, information may be inferred by the attacker in certain circumstances.

Bounds Check Bypass Store

One subvariant of this technique, known as bounds check bypass store, is to use speculative stores to overwrite younger speculative loads in a way that creates a side channel controlled by a malicious actor.

Refer to the example bounds check bypass store sequence below:

int function(unsigned bound, unsigned long user_key) {
     unsigned long data[8];
 
     /* bound is trusted and is never more than 8 */
     for (int i = 0; i < bound; i++){
     data[i] = user_key;
     }
         
     return 0;
}

The example above does not by itself allow a bounds check bypass attack. However, it does allow the attack to speculatively modify memory, and therefore could potentially be used to chain attacks. For example, it is possible that the above sequence might speculatively overwrite the return address on the stack with user_key. This may allow a malicious actor to specify a user_key that is actually the instruction pointer of a disclosure gadget that they wish to be speculatively executed.

The steps below describe how an example attack using this method might occur:

The CPU conditional branch predictor predicts that the loop will iterate 10 iterations, when in reality the loop should have only executed 8 times. After the 10th iteration, the predictor will resolve, fall through, and execute the following instructions. However, the 9th iteration of the loop may speculatively overwrite the return address on the stack.
The CPU decodes the RET and speculatively fetches instructions based on the prediction in the return stack buffer (RSB). The CPU may speculatively execute those instructions.
RET loads the value that it believes is at the top of the stack (but which came from the speculative store of user_key in step 1) and redirects the instruction pointer to that value. The results of any operations speculatively executed in step 2 are discarded.
The disclosure gadget at the instruction pointer of user_key (which was specified by the malicious actor) speculatively executes and creates a side channel that can be used to reveal data specified by the malicious actor.
The conditional jump that should have ended the loop then executes and redirects the instruction pointer to the next instruction after the loop. This discards the speculative store of user_key that overwrote the return address on the stack, as well as all other operations between step 1 and step 4.
The CPU executes the RET again, and the program continues.

Where the compiler has spilled variables to the stack, the store can also be used to target those spilled values and speculatively modify them to enable another attack to follow. An example of this would be by targeting the base address of an array dereference or the limit value.

SMEP will prevent the attack described above from causing a supervisor RET to speculatively execute code in user mode page. Intel® Control-flow Enforcement Technology (Intel® CET) can also help prevent speculative execution of instructions at incorrect indirect branch targets.

This example can be mitigated either by applying LFENCE before the RET (after the loop ends), by using bounds clipping to ensure that store operations do not occur outside of the array’s bounds, even speculatively, or by ensuring that incorrect return pointer is detected and that the return does not speculatively use the incorrect value.

A second variant of this method can occur where a user value is being copied into an array, either on the stack or adjacent to function pointers. As discussed previously, the processor may speculatively execute a loop more times than is actually needed. If this loop moves through memory writing malicious actor-controlled values, then the malicious actor may be able to speculatively perform a buffer overrun attack.

int filltable(uint16_t *from)
		{
			uint16_t buffer[64];
			int i;

			for (i = 0; i < 64; i++)
				buffer[i] = *from++;
		}

In some cases, the example above might speculatively copy more bytes than 64 into the array, changing the return address speculatively used by the processor so that it instead returns to a user-controlled gadget.

As the execution is speculative, some processors will allow speculative writes to read-only memory, and will reuse that data speculatively. Therefore, while placing function pointers into write-protected space is a good general security mitigation, doing so is not sufficient mitigation in this case.

Identifying Bounds Check Bypass Vulnerabilities

The following section examines common instances of bounds check bypass, including the bounds check bypass store variant, but should not be considered a comprehensive list. It describes how to analyze potential bounds check bypass and bounds check bypass store vulnerabilities found by static analysis tools or manual code inspection and presents mitigation techniques that may be used. This document does not include any actual code from any real product or open source release, nor does it discuss or recommend any specific analysis tools.

Common Attributes for Bounds Check Bypass Vulnerabilities

Bounds check bypass code sequences have some common features: they generally operate on data that is controlled or influenced by a malicious actor, and they all have some kind of side-effect that can be observed by the malicious actor. In addition, the processor’s speculative execution sequence executes in a way which would be thrown away in a normally retired execution sequence. In bounds check bypass store variants, data is speculatively written at locations that would be out of bounds under normal execution. That data is later speculatively used to execute code and cause observable side-effects, creating a side channel.

Loads and Stores

A vulnerable code fragment forming a disclosure gadget is made up of two elements. The first is an array or pointer dereference that depends upon an untrusted value, for example, a value from a potentially malicious application. The second element is usually a load or store to an address that is dependent upon the value loaded by the first element. Refer to Microsoft*’s blog for further details.

As bounds check bypass is based upon speculation, code can be vulnerable even if that untrusted value is correctly tested for bounds before use.

The classic general example of such a sequence in C is:

if (user_value >= 0 && user_value < LIMIT) {
       x = table[user_value];
       node = entry[x];
} else
       return ERROR;

For such a code sequence to be vulnerable, both elements must be present. Furthermore, the untrusted value must be under the malicious actor’s control.

When the code executes, the processor has to decide if the user_value < LIMIT conditional is true or false. It remembers the processor register state at this point and speculates (makes a guess) that user_value is below LIMIT and begins executing instructions as if this were true. Once the processor realizes it guessed incorrectly, it throws away the computation and returns an error. The attack relies upon the fact that before it realizes the guess was incorrect, the processor has read both table[user_value], pointing into memory beyond the intended limit, and has read entry[x]. When the processor reads entry[x], it may bring in the corresponding cache line from memory into the L1 cache. Later, the malicious actor can time accesses to this address to determine whether the corresponding cache line is in the L1 data cache. The malicious actor can use this timing to discover the value x, which was loaded from a malicious actor-specified location.

The two components that make up this vulnerable code sequence can be stretched out over a considerable distance and through multiple layers of function calls. The processor can speculatively execute many instructions—a number sufficient to pass between functions, compilation units, or even software exception handlers such as longjmp or throw. The processor may speculate through locked operations, and use of volatile will not change the vulnerability of the code being exploited.

There are several other sequences that may be used to infer information. Anything that tests some property of a value and loads or stores according to the result may leak information. Depending upon the location of foo and bar, the example below might be able to leak bit 0 of arbitrary data.

if (user_value >= LIMIT)
		return ERROR;
	x = table[user_value];
	if (x & 1) 
		foo++;
	else
		bar++;

When evaluating code sequences for vulnerability to bounds check bypass, the critical question is whether different behavior could be observed as a property of x.

This question can be very challenging to answer from code inspection, especially when looking for any specific code pattern. For instance, if a value is passed to a function call, then that function call must be inspected to ensure it does not create any observable interactions. Consider the following example:

if (user_value >= LIMIT)
	return ERROR;
x = lengths[user_value];
if (x)
	memset(buffer, 0, 64 * x);

Here, x influences how much memory is cleared by memset() and might allow the malicious actor to discern something about the value of x from which cache lines the speculatively executed memset touches.

Remember that conditional execution is not just if, but may also include for and while as well as the C ternary (?:) operator and situations where one of the values is used to index an array of function pointers.

Typecasting and Indirect Calls

Typecasting can be a problematic area to analyze and often conceals real examples that can be exploited. This is especially challenging in C++ because you are more likely to have function pointers embedded in objects and overloaded operators that might behave in type-dependent fashion.

Two classes of typecasting problems are relevant to bounds check bypass attacks:

Code/data mismatches. Speculation causes “class Foo” code to be speculatively executed on “class Bar” data using gadgets supplied with Foo to leak information about Bar.
The type confusion is combined with some observable effect, like the load/store effects discussed above. For example, if Foo and Bar are different sizes, a malicious actor might be able to learn something about memory past the end of objects[] using something like the example below.

type = objects[index];
if (index >= len)
	return -EINVAL;
if (type == TYPE_FOO)
	memset(ptr, 0, sizeof(Foo));
else
	memset(ptr, 0, sizeof(Bar));

Take care when considering any code where a typecast occurs based upon a speculated value. The processor might guess the type incorrectly and speculatively execute instructions based on that incorrect type. Newer processors that enable Intel® OS Guard, also known as Supervisor-Mode Execution Prevention (SMEP), will prevent ring 0 code from speculatively executing ring 3 code. All major operating systems (OSes) enable SMEP support by default if the hardware supports it. Older processors, however, might speculate the type incorrectly, load data that the processor thinks are function pointers, or speculate into lower addresses that might be directly controlled by a malicious actor.

For example:

if (flag & 4)
	(Foo *)ptr->process(x);
else
	(Bar *)ptr->process(x);

If the Foo and Bar objects are different and have different memory layouts, then the processor will speculatively fetch a pointer offset of ptr and branch to it.

Consider the following example:

int call; /* from user */
if (call >= 0 && call < MAX_FUNCTION)
		function_table[call](a,b,c);

On first analysis this code might seem safe. We reference function_table[call], but call is the user’s own, known value. However, during speculative execution, the processor might incorrectly speculate through the if statement and speculatively execute invalid addresses. Some of these addresses might be mapped to user pages in memory, or might contain values that match suitable gadgets for ROP attacks.

A less obvious variant of this case is switch statements. Many compilers will convert some classes of switch statement into jump tables. Refer to the following example code:

switch(x) {
case 0: return y;
case 1: return z;
...
default: return -1;
}

Code similar to this will often be implemented by the compiler as shown:

if (x < 0 || x > 2) return -1;
goto  case[x];

Therefore when using switch() with an untrusted input, it might be appropriate to place an lfence before the switch so that x has been fully resolved before the implicit bounds check.

Speculative Loops

A final case to consider is loops that speculatively overrun. Consider the following example:

while (++x < limit) {
	y = u[x];
	thing(y);
}

The processor will speculate the loop condition, and often speculatively execute the next iteration of the loop. This is usually fine, but if the loop contains code that reveals the contents of data, then you might need to apply mitigations to avoid exposing data beyond the intended location of the loop. This means that even if the loop limit is properly protected before the processor enters the loop, unless the loop itself is protected, the loop might leak a small amount of data beyond the intended buffer on the speculative path.

Disclosure Gadgets

In addition to the load and store disclosure gadget referenced above, there may be additional gadgets based on the microarchitectural state. For example, using certain functional blocks, such as Intel® Advanced Vector Extensions (Intel® AVX), during speculative execution may affect the time it takes to subsequently use the block due to factors like the time required to power-up the block. Malicious actors can use a disclosure primitive to measure the time it takes to use the block. An example of such a gadget is shown below:

if (x > sizeof(table))
	return ERROR;
If (a[x].op == OP_VECTOR)
	avx_operation(a[x]);
else
	integer_operation(a[x]);

Conditional Branch Speculation Analysis

Controlling conditional branch speculation, such as bounds check bypass, is not generally relevant if your code doesn’t have secrets that the user shouldn’t be able to access. For example, a simple image viewer probably contains no meaningful secrets that should be inaccessible to software it interacts with. The user of the software could potentially use bounds check bypass attacks to access the image, but they could also just hit the save button.

On the other hand, an image viewer with support for secure, encrypted content with access authorized from a central system might need to care about bounds check bypass because a user may not be allowed to save the document in normal ways. While the user can’t save such an image, they can trivially photograph the image and send the photo to someone, so protecting the image may be less important. However, any keys are likely to be far more sensitive.

There are also clear cases like operating system kernels, firmware (refer to the Host Firmware Speculative Execution Side Channel Mitigations technical paper) and managed runtimes (for example, Javascript* in web browsers) where there is both a significant interaction surface between differently trusted code, and there are secrets to protect.

Whether to apply mitigations, and what areas to target has to be part of your general security analysis and risk modelling, along with conventional security techniques, and resistance if appropriate to timing and other non-speculative side channel attacks. Bounds check bypass mitigations have performance impacts, so they should only be used where appropriate.

Software Techniques for Conditional Speculation Control

LFENCE

The main mitigation for bounds check bypass is through use of the LFENCE instruction. The LFENCE instruction does not execute until all prior instructions have completed locally, and no later instruction begins execution until LFENCE completes. Most vulnerabilities identified in the Identifying Bounds Check Bypass Vulnerabilities section can be protected by inserting an LFENCE instruction; for example:

if (user_value >= LIMIT)
	return ERROR;
lfence();
x = table[user_value];
node = entry[x];

Where lfence() is a compiler intrinsic or assembler inline that issues an LFENCE instruction and also tells the compiler that memory references may not be moved across that boundary. The LFENCE ensures that the loads do not occur until the condition has actually been checked. The memory barrier prevents the compiler from reordering references around the LFENCE, and thus breaking the protection.

Placement of LFENCE

To protect against speculative timing attacks, place the LFENCE instruction after the range check and branch, before any code that consumes the checked value, and before the data can be used in a gadget that might allow measurement.

For example:

if (x > sizeof(table))
	return ERROR;
lfence();
If (a[x].op == OP_VECTOR)
	avx_operation(a[x]);
else
	integer_operation(a[x]);

Unless there are specific reasons otherwise, and the code has been carefully analyzed, Intel recommends that the lfence is always placed after the range check and before the range checked value is consumed by other code, particularly if the code involves conditional branches.

Bounds Clipping

Software can use instructions, such as CMOVcc, AND, ADC, SBB, and SETcc, to constrain speculative execution and prevent bounds check bypass on current family 6 processors (Intel® Core™, Intel® Atom™, Intel® Xeon® and Intel® Xeon Phi™ processors). However, these instructions may not be guaranteed to do so on future Intel processors. Intel intends to release further guidance on the usage of instructions to constrain speculation in the future before processors with different behavior are released. This approach can avoid stalling the pipeline as LFENCE does.

At the simplest:

unsigned int user_value;

if (user_value > 255)
	return ERROR;
x = table[user_value];

Can be made safe by instead using the following logic:

volatile unsigned int user_value;

if (user_value > 255)
	return ERROR;
x = table[user_value & 255];

This works for powers of two array lengths or bounds only. In the example above the table array length is 256 (2^8), and the valid index should be <= 255. Take care that the compiler used does not optimize away the & 255 operation. For other ranges, it’s possible to use CMOVcc, ADC, SBB, SETcc, and similar instructions to do verification.

Although this mitigation approach can be faster than other approaches it is not guaranteed for the future. Developers who cannot control which CPUs their software will run on (such as general application, library, and SDK developers) should not use this mitigation technique. Intel intends to release further guidance on how to use serializing instructions to constrain speculation before future processors with different behavior are released.

Both the LFENCE approach and bounds clipping can be applied to function call tables, while the LFENCE approach is generally the only technique that can be used when typecasting.

Interaction with Memory Disambiguation

Memory disambiguation (as described in the Overview of Data Speculation section) can theoretically impact bounds clipping techniques when they involve a load from memory. In the following example, a CMOVG instruction is inserted to prevent a side channel from being created with data from any locations beyond the array bounds.

CMP RDX, [array_bounds]
JG out_of_bounds_input
MOV RCX, 0
MOV RAX, [RDX + 0x400000]
CMOVG RAX, RCX
<Further code that causes cache movement based on RAX value>

As an example, assume the value at array_bounds is 0x20, but that value was only just stored to array_bounds and that the prior value at array_bounds was significantly higher, such as 0xFFFF. The processor can speculatively execute the CMP instruction using a value of 0xFFFF for the loaded value due to the memory disambiguation mechanism. The instruction will eventually be re-executed with the intended array_bounds value of 0x20. This can theoretically cause the above sequence to support the creation of a side channel that reveals information about the memory at addresses up to 0xFFFF instead of constraining it to addresses below 0x20.

Multiple Branches

When using mitigations, particularly the bounds clipping mitigations, it is important to remember that the processor will speculate through multiple branches. Thus, the following code is not safe:

int *key;
int valid = 0;

if (input < NUM_ENTRIES) {
	lfence();
	key = &table[input];
	valid = 1;
}
….
if (valid)
	*key = data;

In this example, although the mitigation is applied correctly when the processor speculates that the first condition is valid, no protection is applied if the processor takes the out-of-range value and then speculates that valid is true on the other path. In this case it will probably expose the contents of a random register, although not in an easy-to-measure fashion.

Preinitializing key to NULL or another safe address will also not reliably work, as the compiler can eliminate the NULL assignment because it can never be used non-speculatively. In such cases it may be more appropriate to merge the two conditional code sections and put the code between them into a separate function that is called on both paths. Or you could add volatile to key and assign it to NULL—forcing the assignment to occur with volatile, or to add lfence before the final assignment.

Compiler-Based Approaches

Note that there are also compiler-based approaches that automatically augment software with instructions to constrain speculation and can help prevent Bounds Check Bypass, such as Speculative Load Hardening (clang) and the /Qspectre option (MSVC).

Compiler protections against buffer overwrites of return addresses, such as stack canaries, also provide some resistance to speculative buffer overruns. In situations where a loop speculatively overwrites the return address it will also speculatively trigger the stack protection diverting the speculative flow. However, stack canaries alone are not sufficient to protect from bounds check bypass attacks.

Microsoft Visual Studio* 2017 Mitigations

The Microsoft Visual Studio* 2017 Visual C++ compiler toolchain includes support for the /Qspectre flag, which may automatically add mitigation for some bounds check bypass vulnerabilities. For more information and usage guidelines, refer to Microsoft’s public blog and the Visual C++ /Qspectre option page for further details.

LFENCE in Intel Fortran Compiler

You can insert an LFENCE instruction in Fortran applications as shown in the example below. Implement the following subroutine, which calls _mm_lfence() intrinsics:


interface
        subroutine for_lfence() bind (C, name = "_mm_lfence")
            !DIR$ attributes known_intrinsic, default :: for_lfence
        end subroutine for_lfence
    end interface
  
    if (untrusted_index_from_user .le. iarr1%length) then
        call for_lfence()
        ival = iarr1%data(untrusted_index_from_user)
        index2 = (IAND(ival,1)*z'100') + z'200'   
        if(index2 .le. iarr2%length)
            ival2 = iarr2%data(index2)
    endif

The LFENCE intrinsic is supported in the following Intel compilers:

Intel C++ Compiler 8.0 and later for Windows*, Linux*, and macOS*.
Intel Fortran Compiler 14.0 and later for Windows, Linux, and macOS.

Compiler-driven Automatic Mitigations

Across the industry, there is interest in mitigations for bounds check bypass vulnerabilities that are provided automatically by compilers. Developers are continuing to evaluate the efficacy, reliability, and robustness of these mitigations and to determine whether they are best used in combination with, or in lieu of, the more explicit mitigations discussed above.

Operating System Mitigations

Where possible, dedicated operating system programming APIs should be used to mitigate bounds check bypass instead of using open-coded mitigations. Using the OS-provided APIs will help ensure that code can take advantage of new mitigation techniques or optimizations as they become available.

Linux* Kernel

The current Linux* kernel mitigation approach to bounds check bypass is described in the speculation.txt file in the Linux kernel documentation. This file is subject to change as developers and multiple processor vendors determine their preferred approaches.

ifence() : on x86 architecture, this issues an LFENCE and provides the compiler with the needed memory barriers to perform the mitigation. It can be used as lfence(), as in the examples above. On non-Intel processors, ifence() either generates the correct barrier code for that processor, or does nothing if the processor does not speculate.

array_ptr(array, index, max): this is an inline that, irrespective of the processor, provides a method to safely dereference an array element. Additionally, it returns NULL if the lookup is invalid. This allows you to take the many cases where you range check and then check that an entry is present, and fold those cases into a single conditional test.

Thus, we can turn:

if (handle < 32) {
	x = handle_table[handle];
	if (x) {
		function(x);
		return 0;
	}
}
return –EINVAL;

Into:

x = array_ptr(handle_table, handle, 32);
if (x == NULL)
	return –EINVAL;
function(*x);
return 0;

Microsoft Windows*

Windows C/C++ developers have a variety of options to assist in mitigating bounds check bypass (Spectre variant 1). The best option will depend on the compiler/code generation toolchains you are using. Mitigation options include manual and compiler assisted.

In mixed-mode compiler environments, where object files for the same project are built with different toolchains, there are varying degrees of mitigation options available. Developers need to be aware of and apply the appropriate mitigations depending on their code composition and appropriate toolchain support dependencies.

As described in the Operating System Mitigations section, we recommend inserting LFENCE instructions (either manually or with compiler assistance) for mitigating bounds check bypass on Windows. The following sections provide details on how to insert the LFENCE instruction using currently available compiler tool chain mechanisms. These mechanisms are (from lowest level to highest level):

Inline/external assembly
_mm_lfence() compiler intrinsic
Compiler automatic LFENCE insertion

Inline/External Assembly

The Intel® C Compiler and Intel® C++ Compiler provide inline assembly support for 32- and 64-bit targets, whereas Microsoft Visual* C++ only provides inline assembly support for 32-bit targets. Microsoft Macro Assembler* (MASM) or other external, third-party assemblers may also be used to insert LFENCE in assembly code.

_mm_lfence() Compiler Intrinsic

The Intel C Compiler, the Intel C++ Compiler, and the Microsoft Visual C++ compiler all support generating LFENCE instructions for 32- and 64-bit targets using the _mm_lfence() intrinsic.

The easiest way for Windows developers to gain access to the intrinsic is by including the intrin.h header file that is provided by the compilers. Some Windows SDK/WDK headers (for example, winnt.h and wdm.h) define the _mm_lfence() intrinsic to avoid inclusion of the compiler intrin.h. It is possible that you already have code that locally defines _mm_lfence() as well, or uses an already existing definition for the intrinsic.

LFENCE in C/C++

You can insert LFENCE instructions in a C/C++ program as shown in the example below:


#include <intrin.h>
#pragma intrinsic(_mm_lfence)
 
    if (user_value >= LIMIT)
    {
        return STATUS_INSUFFICIENT_RESOURCES;
    }
    else
    {   
        _mm_lfence();   /* manually inserted by developer */
        x = table[user_value];
        node = entry[x];
    }

Miscellaneous Speculation

Similar to conditional branches, certain complex instructions may have speculation happen in the internal flow of such instructions which are not branch instructions. One such case is REP string instructions and may speculatively access memory locations that they do not architecturally access.

REP string instructions such as REPE CMPS iterate over memory and terminate when either the size specified is reached or an alternative condition, such as inequality of a data word, is met. Due to speculation, such REP string instruction may transiently execute the string operation beyond the indicated size. While the processor prevents this transient execution from causing architecturally visible effects (such as by restoring the state of architectural registers), those mis-speculated transient string operations can affect the microarchitectural state (for example, by bringing lines into caches).

In this way, a REP CMPS or REP SCAS instruction (for example, REPE CMPS m64, m64) may compare the contents in memory beyond what was specified. Because those instructions terminate their loop based on the data values (for example, non-matching data for REPE CMPS), whether or not the data values beyond the indicated size meet the condition may affect the microarchitectural state (for example, fewer cache lines may be pulled into a cache if the terminating condition is satisfied). An attacker may be able to monitor how many cache lines beyond the indicated size were cached after code in a different security domain executed REP CMPS or REP SCAS. This may allow the attacker to infer whether the values in memory adjacent to the buffers being processed satisfy the condition. This would also be true for a software implementation of REP CMPS or REP SCAS that did not include bounds check bypass mitigations like LFENCE or masking.

Contrary to REP CMPS and REP SCAS, the execution of REP MOVS and REP STOS does not include comparison operations and thus does not have this behavior. Intel is not aware of any exploits that result from speculatively executing REP CMPS and REP SCAS beyond the indicated size.

Data Speculation

Overview of Data Speculation

Intel processors implement performance features that allow instructions that depend on the behavior of older instructions to speculatively execute before these older instructions have executed:

Memory disambiguation predicts whether the address of a memory load overlaps with the yet-unknown address of a preceding memory store to allow speculative execution of the memory load. Misprediction of memory disambiguation can allow for Speculative Store Bypass attacks that transiently access and infer stale data in memory (as described in the Speculative Store Bypass section).
Fast store forwarding predictor allows a memory load to speculatively use the data of a preceding memory store before all store-to-load forwarding conditions are resolved, for example, before a match of the load and store addresses have been resolved.
The floating-point unit statically predicts floating-point results to be normal to speculatively execute floating-point operations. A microcode assist is triggered to handle denormal/subnormal floating-point results. Floating Point Value Injection is a technique to infer information using the transiently computed floating-point result before a subnormal floating-point microcode assist is triggered and the transient result is cleaned up.

Speculative Store Bypass

Many Intel processors use memory disambiguation predictors that allow loads to be executed speculatively before it is known whether the load’s address overlaps with a preceding store’s address. This may happen if a store’s address is unknown when the load is ready to execute. If the processor predicts that the load address will not overlap with the unknown store address, the load may execute speculatively. However, if there is indeed an overlap, then the load may consume stale data. When this occurs, the processor will re-execute the load to ensure a correct result.

Through the memory disambiguation predictors, an attacker can cause certain instructions to be executed speculatively and then use the effects for side channel analysis. For example, consider the following scenario:

K is a secret asset (for example, a cryptographic key) inside the victim code. The attacker is allowed to know the value of M, but not the value of K. X is a variable in memory. Assuming an attacker can find the following code in a victim application:

X = &K; // Attacker manages to get variable with address of K stored into pointer X
<at some later point>
X = &M; // Does a store of address of M to pointer X
Y = Array[*X & 0xFFFF]; // Dereferences address of M which is in pointer X in order to
// load from array at index specified by M[15:0]

When the above code runs, the load from address X that occurs as part of step 3 may execute speculatively and, due to memory disambiguation, initially receive a value of address of K instead of the address of M. When this value of address of K is dereferenced, the array is speculatively accessed with an index of K[15:0] instead of M[15:0]. The CPU will later re-execute the load from address X and use M[15:0] as the index into the array. However, the cache movement caused by the earlier speculative access to the array may be analyzed by the attacker to infer information about K[15:0].

As in the previous example, an attacker may be able to discover confused deputy code which may allow them to use speculative execution to reveal the value of memory that is not normally accessible to them. In a language-based security environment (for example, a managed runtime), where an attacker is able to influence the generation of code, an attacker may be able to create such a confused deputy. Intel has not currently observed this method in situations where the attacker has to discover such an exploitable confused deputy scenario.

Speculative Store Bypass Control Mechanisms

Intel has developed mitigation techniques for speculative store bypass. It can be mitigated by software modifications, or if those are not feasible, then the use of Speculative Store Bypass Disable (SSBD), which prevents a load from executing speculatively until the addresses of all older stores are known. Intel recommends using the below mitigations only for managed runtimes or other situations that use language-based security to guard against attacks within an address space.

Software-Based Mitigations

Speculative store bypass can be mitigated through numerous software-based approaches. This section describes two such software-based mitigations: process isolation and the selective use of LFENCE.

Process Isolation

One approach is to move all secrets into a separate address space from untrusted code. For example, creating separate processes for different websites so that secrets of one website are not mapped into the same address space as code from a different, possibly malicious, website. Similar techniques can be used for other runtime environments that rely on language-based security to run trusted and untrusted code within the same process. This may also be useful as part of a defense-in-depth strategy to prevent trusted code from being manipulated to create a side channel. Protection Keys can also be valuable in providing such isolation. Refer to the Protection Keys section for more information.

Using LFENCE to Control Speculative Load Execution

Software can insert an LFENCE between a store (for example, the store of address of M in step 2 of the Speculative Store Bypass section) and the subsequent load (for example, the load that dereferences X in step 3 of the Speculative Store Bypass section) to prevent the load from executing before the previous store’s address is known. The LFENCE can also be inserted between the load and any subsequent usage of the data returned which might create a side channel (for example, the access to Array in step 3 of the Speculative Store Bypass section). Software should not apply this mitigation broadly, but instead should only apply it where there is a realistic risk of an exploit; for example, if an attacker can control the old value in the memory location, there is a realistic chance of the load executing before the store address is known, and there is a disclosure gadget that reveals the contents of sensitive memory.

Other mitigations like inserting register dependencies between a vulnerable load address and the corresponding store address may reduce the likelihood of Speculative Store Bypass Attacks being successful.

Speculative Store Bypass Disable (SSBD)

If the earlier software-based mitigations are not feasible, then employing Speculative Store Bypass Disable (SSBD) will mitigate speculative store bypass.

When SSBD is set, loads will not execute speculatively until the addresses of all older stores are known. This ensures that a load does not speculatively consume stale data values due to bypassing an older store on the same logical processor.

Basic Support

Software can disable speculative store bypass on a logical processor by setting IA32_SPEC_CTRL.SSBD to 1.

Both enclave and SMM code will behave as if SSBD is set regardless of the actual value of the MSR bit. The processor will ensure that a load within enclave or SMM code does not speculatively consume stale data values due to bypassing an older store on the same logical processor.

Software Usage Guidelines

Enabling SSBD can prevent exploits based on speculative store bypass. However, this may reduce performance. Intel provides the following recommendations for the use of such a mitigation.

Intel recommends software set SSBD for applications and/or execution runtimes relying on language-based security mechanisms. Examples include managed runtimes and just-in-time translators. If software is not relying on language-based security mechanisms, for example because it is using process isolation, then setting SSBD may not be needed.
Intel is currently not aware of any practical exploit for OSes or other applications that do not rely on language-based security. Intel encourages these users to consider their particular security needs in determining whether to set SSBD outside context of language-based security mechanisms.

These recommendations may be updated in the future.

On Intel® Core™ and Intel® Xeon® processors that enable Intel® Hyper-Threading Technology and do not support enhanced IBRS, setting SSBD on a logical processor may impact the performance of a sibling logical processor on the same core. Intel recommends that the SSBD MSR bit be cleared when in an idle state on such processors.

Operating systems should provide an API through which a process can request it be protected by SSBD mitigation.

VMMs should allow a guest to determine whether to enable SSBD mitigation by providing direct guest access to IA32_SPEC_CTRL.

Data-Dependent Prefetchers

Besides control and data speculation, Intel processors implement prefetchers that prefetch cache lines from memory based on data values previously loaded or prefetched from memory, for example, data-dependent prefetchers (DDP). While such prefetchers do not create speculative execution paths, they may yet allow an attacker to infer information about loaded data values via cache-based side channels.

Intel processors automatically enforce properties for these prefetchers to mitigate potential security concerns, as well as exposing a disable control, as described in the provided DDP documentation.

Additional Software Guidance

The following section describes additional guidance for how software can effectively restrict speculation and protect against speculation-based attacks in a selection of use cases that have an increased risk of exploitation.

Operating Systems

Due to the Speculative Behavior of SWAPGS and Segment Registers, operating systems that use SWAPGS to change the GS segment register on kernel entry need additional mitigations: the recommended mitigation for when SWAPGS is speculatively missed, such as when speculative execution takes a path that does not contain the SWAPGS instruction, is to add an LFENCE or serializing instruction before the first memory reference using GS on all paths that can speculatively skip the SWAPGS instruction. The mitigation for when an extra SWAPGS instruction is speculatively executed when it should not be is to add an LFENCE or serializing instruction after the SWAPGS instruction.

System Management Mode (SMM)

On certain processors from the Skylake generation, System Management Interrupt (SMI) handlers can leave the RSB in a state that OS code does not expect. To avoid RSB underflow on return from SMI and ensure retpoline implementations in the OS and VMM work properly, on these processors, an SMI handler may implement RSB stuffing before returning from System Management Mode (SMM).

Related Intel Security Features and Technologies

There are security features and technologies, either present in existing Intel products or planned for future products, which reduce the effectiveness of the attacks mentioned in the previous sections.

Intel® OS Guard

When Intel® OS Guard, also known as Supervisor-Mode Execution Prevention (SMEP), is enabled, the operating system will not be allowed to directly execute application code, even speculatively. This makes branch target injection attacks on the OS substantially more difficult by forcing the attacker to find gadgets within the OS code. It is also more difficult for an application to train OS code to jump to an OS gadget. All major operating systems enable SMEP support by default.

Execute Disable Bit

The Execute Disable Bit is a hardware-based security feature that can help reduce system exposure to viruses and malicious code. Execute Disable Bit allows the processor to classify areas in memory where application code can or cannot execute, even speculatively. This reduces the gadget space, increasing the difficulty of branch target injection attacks. All major operating systems enable Execute Disable Bit support by default. Applications are encouraged to only mark code pages as executable.

Intel® Control-Flow Enforcement Technology (Intel® CET)

Intel Control-Flow Enforcement Technology (Intel® CET) is a feature on recent Intel products to protect control-flow integrity against Return-Oriented Programming (ROP) / Call-Oriented Programming (COP) / Jump-Oriented Programming (JOP) style attacks. It provides two main capabilities:

Shadow stack: A shadow stack is a second independent stack which is used exclusively for control transfer operations. When shadow stacks are enabled, RET instructions require that return addresses on the data stack match the address on the shadow stack, which can be used to mitigate ROP attacks.
Indirect branch tracking (IBT): When IBT is enabled, the processor requires that the instruction at the target of indirect JMP or CALL instructions is an ENDBRANCH. Software must be compiled to place the ENDBRANCH instruction at valid targets.

Intel CET also applies restrictions to transient execution to constrain speculative control flow. These restrictions may be relevant for both control-flow speculation and attacker-controlled jump redirection. More details can be found in the “Control-flow Enforcement Technology (CET)” chapter of the IA-32 Intel® Architecture Software Developer’s Manual.

Intel CET Shadow Stack Speculation Limitations

When CET Shadow Stack is enabled, the processor will not execute instructions, even speculatively, at the loaded target of the return address of a RET instruction if that target differs from the predicted target (for example, that predicted by the Return Stack Buffer), and:

The RET address values on the data stack and shadow stack do not match; or
Those address values may be transient (for example, the values may have been modified by an older speculative store).

Intel CET indirect Branch Tracking (CET IBT) Speculation Limitations

When CET IBT is enabled, instruction execution will be limited or blocked, even speculatively, if the next instruction is not an ENDBRANCH after an indirect JMP or CALL which sets the IBT tracker state to WAIT_FOR_ENDBRANCH. The Tiger Lake implementation of Intel CET limits speculative execution to a small number of instructions (less than 8, with no more than 5 loads) after a missing ENDBRANCH. On Alder Lake, Sapphire Rapids, Raptor Lake, and some future processors, the potential speculation window at a target that does not start with ENDBRANCH is limited to two instructions (and typically fewer) with no more than 1 load.

The intended long-term direction, and behavior on some current implementations (including E-core only products like Alder Lake-N and Arizona Beach), is to completely block the speculative execution of instructions after a missing ENDBRANCH.

Protection Keys

On Intel processors that have both hardware support for mitigating Rogue Data Cache Load (IA32_ARCH_CAPABILITIES[RDCL_NO]) and protection keys support (CPUID.7.0.ECX[3]), protection keys can limit the data accessible to a piece of software. This can be used to limit the memory addresses that could be revealed by a branch target injection or bound check bypass attack.

Supervisor-Mode Access Prevention (SMAP)

SMAP can be used to limit which memory addresses can be used for a cache-based side channel, by blocking allocation of an application line. This may make it more difficult for an application to perform the attack on the kernel, as it is more challenging for an application to determine whether a kernel line is cached than an application line. On Intel processors that have both hardware support for mitigating Rogue Data Cache Load (IA32_ARCH_CAPABILITIES[RDCL_NO]) and SMAP support, loads that cause a page fault due to SMAP will not speculatively return the loaded data even on a L1D cache hit or fill/evict any caches for that address. On processors that have SMAP support but do not enumerate RDCL_NO, loads that cause a page fault due to SMAP may speculatively return the loaded data on L1D cache hits but will not fill/evict any caches for that address.

Linear Address Space Separation

Linear Address Space Separation (LASS), described in chapter 11 of the Intel® Architecture Instruction Set Extensions and Future Features programming reference, prevents user mode code from causing page walks and Translation Lookaside Buffer (TLB) fills for supervisor addresses, and (when SMAP is enabled and effective) also provides similar limitations for supervisor code attempting to access user mode addresses.

When LASS is not used, such page walks and TLB fills may allow a user mode attacker to infer which linear addresses in supervisor space are mapped, which may lead to breaking Kernel Address Space Layout Randomization (KASLR).

LASS can also help reduce the risk of speculative execution associated with other new features. For example, when Linear Address Masking (LAM) (described in chapter 7 of the Intel® Architecture Instruction Set Extensions and Future Features programming reference) is enabled, addresses that would otherwise be non-canonical may be valid pointers after masking is applied, providing an address-translation covert channel for a wider range of values. Although this is not a vulnerability, it could potentially be used as part of other attacks. When LAM is enabled for a user application, enabling LASS and SMAP in supervisor code restricts the potential use of this covert channel, acting as a defense-in-depth mitigation. Intel recommends operating system software enable LASS when possible.

CPUID Enumeration and Architectural MSRs

The link above describes processor support for mitigation mechanisms as enumerated using the CPUID instruction and several architectural MSRs.

References

Footnotes

The specific instructions are described in the Overview of Indirect Branch Predictors section . Note that the target address of direct branch instructions is also predicted but Intel processors do not allow speculative execution at incorrect target addresses that are due to direct branches.
This is an example of attacker-controlled jump redirection.
A transition to a more privileged predictor mode through an INIT# is an exception to this and may not be sufficient to prevent the predicted targets of indirect branches executed in the new predictor mode from being controlled by software operating in a less privileged predictor mode.
An RSB overwrite sequence is a sequence of instructions that includes 32 more near CALL instructions with non-zero displacements than it has near RETs.
Note that indirect branches include near call indirect, near jump indirect and near return instructions; as documented by the speculative execution side channel mitigations guidance. Because it includes near returns, it follows that RSB entries created before an IBPB command cannot control the predicted targets of returns executed after the command on the same logical processor.

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Hardware Features and Behavior Related to Speculative Execution