Intel Research on Disclosure Gadgets at Indirect Branch Targets in the Linux* Kernel

Published: 11/03/2022

Last Updated: 11/10/2022

Blog by Scott Constable with contributions from the IPAS security team

About the author: Scott is a security researcher in Intel Labs. He received his PhD in computer science from Syracuse University in 2018. Scott’s current research covers instruction set architecture security and transient execution attack mitigation. He recently developed an optimized software mitigation technique for Load Value Injection (LVI); this approach has been adopted by LLVM/clang.

In March 2022, researchers at VU Amsterdam disclosed Branch History Injection (BHI) and Intra-Mode Branch Target Injection (IMBTI). Both are transient execution attacks that, as their names suggest, exploit the behavior of the hardware indirect branch predictor. The researchers demonstrated both attacks against a commodity Linux* kernel, though the attacks do require an adversary to be able to execute code locally on a victim machine. Intel has already provided guidance to mitigate BHI and IMBTI. One of our recommendations is to apply spot mitigations to any specific disclosure gadgets in privileged software (such as a kernel) that are found to be exploitable. We didn’t want to wait for someone else to trawl for exploitable gadgets, so we immediately set out to do our own analysis on the Linux kernel.

We did find some gadgets that initially looked promising—scroll down to see some examples. But we were not able to put together a working exploit with any of these gadgets. This is because BHI and IMBTI are complex attacks that require more than just a disclosure gadget, and in our experience it wasn’t feasible to combine all of the essential ingredients.

Intel defines a disclosure gadget as an instruction sequence that can execute transiently1 and both:

1. Access a victim’s secret
2. Transmit that secret over a covert channel

BHI and IMBTI can occur when an adversary is able to cause an indirect branch to predict to an indirect branch target that was previously reached within the same predictor mode. The branch target must contain a disclosure gadget that can execute before the processor detects the branch misprediction and squashes the pipeline2. The branch target may have been reached either architecturally or transiently.  However, this analysis assumes that the disclosure targets must be reached architecturally3.

Disclosure gadgets can be constructed using managed runtimes, including those that allow unprivileged users to generate and execute code in the kernel. This is precisely how the VU Amsterdam researchers demonstrated their BHI attack against the Linux kernel. Until recently, some Linux distributions would enable a feature called unprivileged eBPF by default, which allows unprivileged user programs to generate and execute code in a sandboxed environment within the Linux kernel. In their paper, the researchers describe how they were able to cause unprivileged eBPF’s just-in-time (JIT) compiler to generate disclosure gadgets that can be used to mount a BHI or IMBTI attack.

The VU Amsterdam researchers also conjectured that “potential disclosure gadgets” [1] (gadgets that are “not conclusively exploitable”) may also exist incidentally and unintentionally within the Linux kernel itself. They built a tool on the angr framework to find these potential disclosure gadgets and claimed to have identified 1,177 disclosure gadget candidates within the Linux kernel. At the time of this writing the list of candidates has not been published, and the researchers have not indicated that they have used any of the candidates to achieve a demonstrable exploit.

The analysis in this blog focuses on disclosure gadgets that use a cache covert channel to transmit data. In x86-64 assembly, such a disclosure gadget might look like this:

example_masked_disclosure_gadget:
shl   rax, 6                          # Shift by cache-line granularity
mov   ebx, WORD PTR [rax-0xdeadbeef]  # Transmit over covert cache channel


In this example, an adversary that controls the entire value in rdi can transiently load a byte, shift the byte by the CPU’s cache-line granularity ($$2^6=64$$ bytes), and then use the loaded byte as an index to access memory relative to an adversary-known location, such as -0xdeadbeef. This last operation, commonly referred to as a transmitter, encodes the loaded byte into the CPU’s caches by triggering a cache fill that can later be detected by a cache analysis technique such as Prime+Probe4. If the loaded byte is not zero/sign extended by movzx/movsx, or the upper bytes are not masked by other operations (such as a logical AND), then the address formed by the transmitter may be non-canonical and the transmitter will not transmit the loaded byte. When a disclosure gadget does mask the upper register bits, we refer to the gadget as masked; all other disclosure gadgets are unmasked.

example_unmasked_disclosure_gadget:
shl   rax, 6                          # Shift by cache-line granularity
mov   ebx, WORD PTR [rax-0xdeadbeef]  # Transmission only succeeds if


Linear Disclosure Gadgets: The Focus of This Analysis

When a masked gadget is not interposed by any conditional branch or procedure call, we refer to the disclosure gadget as linear. Of all the potential disclosure gadgets that may exist unintentionally within a given codebase, linear disclosure gadgets are the ones which are most like disclosure gadgets that could be intentionally constructed by an adversary.

Intel’s Analysis Approach

Concurrent with the VU Amsterdam research and prior to the March 2022 disclosure of BHI and IMBTI, we developed an LLVM-based5 static analysis tool that can scan indirect branch targets to identify potential disclosure gadgets that consist of an adversary-controlled load and a dependent cache covert channel transmitter with an adversary-controlled/knowable base address. Intel has open-sourced the analysis tool. Our objective was to investigate whether exploitable gadgets may exist in the Linux kernel, independent from code generated by unprivileged eBPF.

Our analysis tool uses LLVM’s Register Dataflow Framework to reconstruct a static single assignment form for each function that will be emitted into the final binary. At each indirect branch target, the tool identifies the set of live registers and then traverses the def-use chain beginning at each live register. The tool records dataflows consisting of:

<ipv4_conntrack_local>:
movzx  ecx,WORD PTR [rsi+0xb4]      # Load Secret
movzx  eax,WORD PTR [rcx+rax*1+0x6] # Transmit Secret


Analysis Results

We applied our analysis to the Linux kernel (v5.16, defconfig) built with link-time optimization (LTO) and LLVM’s default inlining threshold. We also analyzed several kernel modules. We scanned all indirect call targets, indirect jump targets, and return targets. The table below summarizes our results.

Table 1: Disclosure Gadget Candidate Analysis
Indirect Call 20,615 27,762 (9,334) 284 (145) 8 (8)
Indirect Jump 5,030 3,813 (1,767) 41 (134) 13 (13)
Return 262,401 63,888 (81,298) 928 (2,313) 55 (49)
Total 288,046 95,463 (92,399) 1,253 (2,592) 76 (70)

First, let’s discuss the indirect call and jump targets. In total, the scanning tool identified 25,645 indirect call/jump targets within the Linux kernel and its modules. Not all targets are reachable via system calls or other user-facing utilities. Of these 25,645 targets, 11,101 have at least one potential disclosure gadget, and there are a total of 31,575 potential disclosure gadgets. Among these, the tool found 325 potential masked disclosure gadgets reachable from 179 indirect jump/call targets. Many of the masked gadgets pass through one or more conditional branches, and/or pass across one or more procedure calls. Either of these conditions implies that the gadget is non-linear. There are 21 potential linear gadgets at 21 indirect call/jump targets. Remember, the linear gadgets do not pass through any conditional branches or across any procedure calls. The 13 linear gadgets at indirect jump targets are all reachable from a single jump table within the ___bpf_prog_run() eBPF function. Since unprivileged eBPF is disabled by default in the Linux kernel, as long as unprivileged eBPF remains disabled, it seems unlikely that malicious user-mode software could steer the Linux kernel to reach these targets.

Now, let's consider the return targets, which in some situations can also be relevant for these styles of attacks. Predictions for RET instructions are typically made using the return stack buffer (RSB). However, when the RSB underflows, some processors may use the indirect branch predictor to predict the return target, as described in Retpoline: A Branch Target Injection Mitigation. If that prediction is incorrect, then the correct return target may be “remembered” by the indirect branch predictor and could later be used to predict the target of another indirect branch. Our tool found a total of 95,463 potential disclosure gadgets at 92,399 return targets. Among these gadgets, 55 are linear. At the time of this writing, it isn’t clear that any of these return target gadgets could be exploited in a BHI or IMBTI attack. Whereas indirect call/jump targets enter the indirect branch predictor when an indirect call/jump to the target is executed, return targets only enter the indirect branch predictor when the RSB underflows. Hence, in addition to identifying a suitable disclosure gadget at a return target, the adversary must also be able to steer the kernel to that return target in a call stack deep enough to underflow the RSB.

What follows next is a deep dive into some of the potential gadgets that our tool found. We describe all 8 of the linear gadgets that we found at indirect call targets. For brevity, we describe 1 of the 55 linear gadgets that we found at return sites. For the reasons discussed above, we don’t think that any of the indirect jump target linear gadgets are exploitable, so we don’t describe those here.

It’s worth noting first that none of the indirect call targets with a linear gadget are reachable for a malicious user-mode adversary with a default Linux kernel configuration. We haven’t applied a similar analysis to the potential gadgets at return targets, though it’s worth re-emphasizing that they must also be reached in a deep call stack.

And as a note for readers who aren’t as familiar with assembly code, we have written all of the code snippets using Intel assembly syntax, where the destination operand precedes the source operand. For example, mov rax, rbx means “move from rbx to rax.”

Suppose that the adversary controls the value in rdi at the indirect branch that mispredicts to hctx_type_show. Then the adversary can load a word at a chosen memory address and transmit bits 3:15 of the word over a cache covert channel with a fixed base address at -0x7dbd7bb0:

<hctx_type_show>:
movzx  eax, WORD PTR [rdi+0xfc]           # Load Secret
mov    rdx, QWORD PTR [rax*8-0x7dbd7bb0]  # Transmit Secret


The corresponding source code is shown below, where the shape of the disclosure gadget is visually apparent. The operation hctx_types[hctx->type] loads the unsigned 16-bit word hctx->type from memory, and then uses this value to index into the pointer array hctx_types:

static const char *const hctx_types[] = {
[HCTX_TYPE_DEFAULT]	= "default",
[HCTX_TYPE_POLL]	= "poll",
};

static int hctx_type_show(void *data, struct seq_file *m)
{
struct blk_mq_hw_ctx *hctx = data;

BUILD_BUG_ON(ARRAY_SIZE(hctx_types) != HCTX_MAX_TYPES);
seq_printf(m, "%s\n", hctx_types[hctx->type]);
return 0;
}


This function is part of Linux’s debugfs. A user typically requires root/sudo privilege to invoke hctx_type_show. We confirmed that this function can be reached as a root/sudo user by invoking the following command:

\$ cat /sys/kernel/debug/block/sda/hctx0/type

Assume that the adversary controls the value in rdx at the indirect branch site. Then this linear gadget can potentially transmit 3 bits out of each adversary-chosen byte. However, two of these bits are transmitted at sub-cache-line granularity, and therefore cannot be inferred by an adversary with a cache analysis technique such as Prime+Probe5 . Another issue with this gadget is that it can only be reached when Linux’s mq-deadline I/O scheduler is enabled.

<dd_merged_requests>:
push   r15
push   r14
push   r12
push   rbx
mov    rbx, rdx
mov    r14, rdi
mov    rax, QWORD PTR [rdi+0x8]
mov    r15, QWORD PTR [rax+0x8]
movzx  eax, WORD PTR [rdx+0x7e]        # Load Secret
shr    rax, 0xb                        # Loses bits 0:10 of Secret
and    eax, 0x1c                       # Loses bits 11:12 of Secret
mov    eax, DWORD PTR [rax-0x7dc20e50] # Transmit bits 13:14,
# but within a cache line;
# bit 15 determines whether the
# cache line at -0x7dc20e00 or
# at -0x7dc20e40 is accessed


Assume that the adversary controls the value in rdi at the indirect branch site and the kernel memory to which rdi points. Specifically, [rdi-0x70] should contain a pointer P to adversary-controlled kernel memory, wherein [P+0x20] contains another pointer Q to 26 bytes below the adversary-chosen secret, and [P+0x1c]=0 and [P] contains the base address for the transmitter. Hence, the adversary can load a chosen word and transmit bits 6:15 of the word over a cache covert channel relative to an adversary-chosen base address.

This gadget can only be reached if the e1000 driver is in use (that is, when the machine is using an e1000 device).

<e1000_clean>:
push   rbp
mov    rbp, rsp
push   r15
push   r14
push   r13
push   r12
rbx
and    rsp, 0xfffffffffffffff0
sub    rsp, 0x100
mov    rax, QWORD PTR gs:0x28
mov    QWORD PTR [rsp+0xf0], rax
mov    DWORD PTR [rsp+0x4c], 0x0
mov    r13, QWORD PTR [rdi-0x70]      # Load P into r13
mov    r12, QWORD PTR [rdi+0x1c8]
mov    eax, DWORD PTR [r13+0x1c]      # Load 0 into eax
mov    rcx, QWORD PTR [r13+0x20]      # Load Q into rcx
lea    rdx, [rax+rax*4]               # rdx=0
movzx  edx, WORD PTR [rcx+rdx*8+0x1a] # Load Secret into edx
mov    rbx, rdx                       # Copy Secret into rbx
shl    rbx, 0x4                       # Shift Secret by 4 bits
lea    r8, [rcx+rbx*1]
xor    r14d, r14d
test   BYTE PTR [rcx+rbx*1+0xc], 0x1  # Transmit Secret


In this linear gadget, the base address and secret are loaded from two distinct adversary-controlled registers that alias to the same memory address. Therefore, the adversary may not be able to control the transmitter. And, similar to the prior gadget, this gadget can only be reached if the e1000e driver is in use.

This gadget can be reached when the e1000e driver is loaded and being used to drive newer “packet-split”-enabled e1000e devices.

<e1000_clean_rx_irq_ps>:
push   rbp
push   r15
push   r14
push   r13
push   r12
push   rbx
sub    rsp, 0x70
mov    rbp, rdi                 # rbp and rdi point to the same address
mov    rax, QWORD PTR [rdi]
mov    QWORD PTR [rsp+0x10], rax
movzx  r13d, WORD PTR [rbp+0x22]      # Load Secret
mov    rcx, r13
shl    rcx, 0x5
mov    ebx, DWORD PTR [rdi+rcx*1+0x8] # Transmitter is neither
# knowable

Assume that the adversary controls the value in rdi at the indirect branch site. The adversary must be able to form an address that will serve as both the address from which to load the secret, and as the base address for the transmitter. This gadget is that it can only be reached when Linux’s kyber I/O scheduler is enabled.

<kyber_bio_merge>:
push   rbp
push   r15
push   r14
push   r12
push   rbx
mov    r14, rsi
mov    eax, DWORD PTR gs:[rip+0x7eb7b15e]        # 15560 <cpu_number>
mov    eax, eax
mov    rcx, QWORD PTR [rax*8-0x7db05830] # Adversary must be able to
# deduce rcx
add    rcx, QWORD PTR [rdi+0x38]  # rcx will become the base address
# and the address of the secret,
# so the Adversary should choose
# X=[rdi+0x38] such that the desired
# address is equal to X plus the
# previous value of rcx
mov    eax, DWORD PTR [rsi+0x10]
xor    esi, esi
test   al, al
sete   sil
test   eax, 0x1000000
mov    edi, 0x2
cmove  rdi, rsi
mov    r15d, edx
mov    rbp, QWORD PTR [rcx+rdi*8+0x50] # Load (&Secret – 0xfc)
mov    rdx, QWORD PTR [rbp+0xb0]
mov    rbx, QWORD PTR [rdx+0x50]
movzx  edx, WORD PTR [rbp+0xfc]        # Load Secret
movzx  ecx, WORD PTR [rcx+rdx*2+0x44]  # Transmit Secret


Assume that the adversary controls the value in rdi at the indirect branch site. Then the adversary can load a word at a chosen memory address and transmit a value (using rdi as a base address) computed using the loaded word. This gadget is in the 802.11 stack and may be reachable through I/O or interrupts if the system has a corresponding 802.11 device.

The value transmitted is: (Secret >> 4) * 0xfc + (Secret & 0xf) * 0x18.

<minstrel_ht_get_expected_throughput.llvm.4422275375963575374>:
movzx  ecx, WORD PTR [rdi+0x14]        # Load Secret
mov    rsi, rcx
shr    rsi, 0x4
mov    r9d, ecx
and    r9d, 0xf
imul   rax, rsi, 0xfc
lea    rdx, [r9+r9*2]
movzx  r8d, WORD PTR [rax+rdx*8+0x104] # Transmit a value computed
# using Secret


Assume that the adversary controls the value in rdi at the indirect branch site, and rdi points to adversary-controlled memory. Then the adversary can load a word at a chosen memory address and transmit bits 1:15 of the word over a cache covert channel with a fixed base address. However, this function is only executed a single time at either boot or on PCI hot-plug events. That limits how often this gadget can be used; to how often PCIe hotplug operations are performed.

<tg3_start_xmit>:
push   rbp
push   r15
push   r14
push   r13
push   r12
push   rbx
sub    rsp, 0xc0
mov    r14, rsi
mov    rax, QWORD PTR gs:0x28
mov    QWORD PTR [rsp+0xb8], rax
movzx  eax, WORD PTR [rdi+0x7c]         # Load Secret
mov    r9, QWORD PTR [rsi+0x380]
lea    r10, [rax+rax*4]
shl    r10, 0x6
imul   r13, rax, 0x2c0                  # Multiply Secret by 704
lea    rbp, [rsi+r13*1]                 # rbp <- rsi + (Secret * 704)
mov    rax, QWORD PTR [rsi+0x1b10]
shr    rax, 0x3d
and    eax, 0x1
imul   r12, rax, 0x2c0
mov    ebx, DWORD PTR [r12+rbp*1+0xc40] # Transmit (Secret * 704)


Assume that the adversary controls the value in rdi at the indirect branch site, and the adversary either controls or has a way to know the value of rsi and the value of the memory to which rsi points. Then the adversary can load a word at a chosen memory address and transmit the word over a cache covert channel with an adversary controlled/knowable base address.

This gadget can only be reached if the system has a Tigon3 network interface controller (NIC).

<tg3_start_xmit>:
push   rbp
push   r15
push   r14
push   r13
push   r12
push   rbx
sub    rsp, 0xc0
mov    r14, rsi
mov    rax, QWORD PTR gs:0x28
mov    QWORD PTR [rsp+0xb8], rax
movzx  eax, WORD PTR [rdi+0x7c]         # Load Secret
mov    r9, QWORD PTR [rsi+0x380]
lea    r10, [rax+rax*4]
shl    r10, 0x6
imul   r13, rax, 0x2c0                  # Multiply Secret by 704
lea    rbp, [rsi+r13*1]                 # rbp <- rsi + (Secret * 704)
mov    rax, QWORD PTR [rsi+0x1b10]
shr    rax, 0x3d
and    eax, 0x1
imul   r12, rax, 0x2c0
mov    ebx, DWORD PTR [r12+rbp*1+0xc40] # Transmit (Secret * 704)


If the user-mode adversary can steer the kernel to return from __memcpy() to the highlighted code while causing the RSB to underflow, then the address of the movzx instruction may be “remembered” by the indirect branch predictor and could be used to predict the target of a later indirect branch. If the adversary can find another indirect branch within the kernel that satisfies the remaining requirements for BHI or IMBTI (see Closing Thoughts below), and the adversary controls both the register contents of rbx and the contents of the kernel stack pointed to by rsp, then the adversary can potentially use this gadget to load and transmit bits 6:15 of the word stored at rbx+0x1d2.

<nfs4_submount>:
call   ffffffff81d4b9b0 <__memcpy>
movzx  eax,WORD PTR [rbx+0x1d2]
mov    rcx,QWORD PTR [rsp]
mov    BYTE PTR [rax+rcx*1+0x1],0x0


Closing Thoughts

We began with no notion of what we might find. Would there be thousands of disclosure gadgets? None? A handful? And are they exploitable? Even with all of the analysis above, the answer is complicated because the disclosure gadget is only one of several components required to launch a successful BHI or IMBTI attack. At a minimum, to execute a BHI or IMBTI attack the adversary must be able to satisfy all of the following conditions:

• Execute code locally on the victim machine.
• Find a suitable disclosure gadget that is reachable within the current running kernel and given the privileges delegated to the adversary’s process (for example, utilities like seccomp can restrict which system calls can be invoked by a process).
• The disclosure gadget must be in code configured to be a part of the kernel. This results in code that is either built into the kernel image or present in a loadable module. In general, loadable modules are not loaded for hardware which is not present in the system. Most of the discussed disclosure gadgets are in code which is typically built as a loadable module. For example, a system not using NFS4 would be unlikely to even have the nfs4_submount gadget in memory.
• Find the location within the kernel where a secret is located. This may require the adversary to first break kernel address space layout randomization (KASLR) and/or perform other surveillance on the victim platform.
• Invoke a system call, trigger an interrupt or exception, etc. to steer the kernel to reach the disclosure gadget.
• Invoke another system call to trigger an indirect branch misprediction to the disclosure gadget with adversary-controlled register contents (or other processor context) that align with the disclosure gadget’s inputs. Steering the misprediction to the desired disclosure gadget may require the adversary to create aliasing in either the branch history buffer (for a BHI attack) or the branch target arrays (for an IMBTI attack). More details can be found in Intel’s BHI and IMBTI security guidance.
• Slow or stall the CPU pipeline at the right time to create a speculation window large enough for the entire disclosure gadget to execute transiently.
• Use a side-channel analysis technique such as Prime+Probe to recover the data transmitted by the disclosure gadget, while accounting for factors such as noise created by other workloads running on the system.

This is a lot to put together. We tried to build a PoC to exploit some of the reachable gadgets that we found, but we were unsuccessful. This doesn’t mean that BHI/IMBTI using a disclosure gadget in the Linux kernel is impossible—we can’t prove a negative. But it certainly is not trivial to execute.

A Note on How We Count Gadgets

We characterize a gadget by its access operation. For example:

• If an access propagates to two transmitters, it is counted as a single gadget.
• If an access is reachable from two different indirect jump targets, it is counted as a single gadget.

A branch target is counted if any gadget is reachable from that branch target. For example, suppose a function has a single gadget that is reachable from two different indirect jump targets. Then our tool would report for that function that two indirect jump targets have at least one reachable gadget.

References

1. E. Barberis, P. Frigo, M. Muench, H. Bos and C. Giuffrida, "Branch History Injection: On the Effectiveness of Hardware Mitigations Against Cross-Privilege Spectre-v2 Attacks," in USENIX Security 22, Boston, MA, 2022.
2. J. Wikner and K. Razavi, "RETBLEED: Arbitrary Speculative Code Execution with Return Instructions," in 31st USENIX Security Symposium (USENIX Security 22), Boston, MA, 2022.
3. A. Moghimi, J. Wichelmann, T. Eisenbarth and B. Sunar, "MemJam: A False Dependency Attack Against Constant-Time Crypto Implementations," International Journal of Parallel Programming, vol. 47, no. 4, pp. 538-570, 2019.
4. O. Kirzner and A. Morrison, "An Analysis of Speculative Type Confusion Vulnerabilities in the Wild," in 30th USENIX Security Symposium (USENIX Security 21), 2021.

Footnotes

1.  Instructions execute transiently when they execute but do not commit to architectural processor state. These transient instructions can sometimes affect microarchitectural processor state in a manner that can later be observed through timing analysis techniques such as Prime+Probe.
2. There are other variants of Branch Target Injection (BTI) that share this requirement: that the adversary must be able to find a disclosure gadget at a reachable indirect branch target within the kernel. One such example is Retbleed [2].
3.  Intel’s guidance on BHI and IMBTI states that predictor entries created in the same predictor mode “may contain targets corresponding to the targets of indirect near jump, indirect near call and/or near return instructions, even if these branches were only transiently executed” [emphasis added]. For example, suppose that debugfs is not mounted, and therefore the kernel would not be expected to call Linear Gadget #1: hctx_type_show. If a malicious user-mode adversary can cause an indirect branch in the kernel to speculatively execute hctx_type_show (for example, by using speculative type confusion [4]), then the adversary may be able to use BHI or IMBTI to subsequently cause a different indirect branch in the kernel to use this predictor entry to speculatively execute hctx_type_show. This might be useful if the adversary does not control the contents of rdi at the former indirect branch, but does control the contents at the latter indirect branch. To simplify our analysis, we focused on indirect branch targets that are architecturally reachable. Whenever we say that a gadget “can be reached” or “is reachable,” we mean that the target is architecturally reachable.
4. There are also sub-cache-line-granularity side channels, such as Memjam [3].
5. Most commodity Linux kernels are built with gcc. Since our analysis uses an LLVM-based compiler pass, the results we obtained may not be representative of a Linux kernel built using gcc.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.