Przemyslaw Duda, INT31
Overview
This article introduces the Intel.BIN tool I am creating for myself and others, including researchers, validators, and developers. The journey begins with historical context to establish a common understanding including some technical insights into modern Intel CPUs. Following is the Intel.BIN discussion, where I focus on the most relevant aspects of the framework architecture. This includes key functionalities such as identifying simple instructions amidst the multitude of x86 macroinstructions and detecting buffer overruns, which frequently lead to security vulnerabilities, using the Monitor Any Address (MAA) concept. It also tackles more complex challenges, such as locating chained EDK2 function calls without prior knowledge of any function addresses. This capability significantly enhances the scalability of firmware security research by enabling automated analysis of large codebases. The blog concludes with an overview of the current state of the art.
Context
Intel.BIN is an acronym for the BIOS INstrumentation tool/framework, which enables dynamic instrumentation of legitimate BIOS binary code executing on the platform. Consider it analogous to Intel PIN but operating within the BIOS runtime environment as a standalone solution with no external dependencies. Before delving further into specifics, it's important to set the stage with some background. Understanding the context in which Intel.BIN was created will provide valuable insights into its purpose and functionality. Let's explore the foundational elements that led to its development and then dive into how it operates.
Basic Input/Output System (BIOS)
Nowadays, the term BIOS is replaced, or extended, by UEFI or Unified Extensible Firmware Interface. This interface found its implementation in EDK2 codebase. But everybody says BIOS, so we will do so in this blog. The BIOS code, depending on the phase, executes mostly in IA32-e 64bit sub mode with flat memory model.
Typically, BIOS is kept at rest in non-volatile memory in the form of so-called Firmware Volumes. These contain data and typically position independent code leveraging RIP-relative-addressing (instruction pointer register – relative addressing) which allows the bootstrap code to relocate other modules code at chosen addresses.
Before the CPU Core’s RIP is handed over to the first line of OS Kernel code through a Boot Loader, the BIOS initializes, configures and even tests the platform in EDK2 defined phases SEC, PEI and DXE. After DXE there is Boot Device Selection (BDS) and only thereafter is the boot loader executed as an EFI application, which typically terminates the boot phase of the BIOS via call to ExitBootServices(). Afterwards, the CPU transitions to OS kernel.
Available Toolset
Once the boot process reaches a certain point, tools and frameworks available expand significantly. This is particularly true in open-source environments, where everything can be tailored to meet specific needs. Options such as dynamic instrumentation, emulation, symbolic and concolic execution, and fuzzing with or without coverage are readily accessible, both in user space and kernel space. However, when it comes to the pre-OS environment, the landscape changes. This phase is responsible for establishing the System Management Mode (SMM) runtime, configuring the platform, enabling Confidential Computing technologies such as Intel® Software Guard Extensions (Intel® SGX) and Intel® Trust Domain Extensions (Intel® TDX), ensuring authenticity and integrity through Intel® Boot Guard, and enabling and configuring Intel® BIOS Guard. Although not entirely devoid of resources, the pre-OS environment offers a significantly reduced array of tools compared to those available in user and kernel spaces.
A gap I found prompted me to create Intel.BIN.
Origins
As mentioned, BIOS configures hardware resources. What do I mean? Components like Caching and Home Agents, integrated Memory Controllers, UPI links, MKTME, address ranges for decoders Memory Mapped I/O (MMIO), Memory Mapped Config Space Space (MMCFG), Legacy, DRAM, Remote DRAM, the location of architectural ranges – SMRR, PRMRR. These and other resources must be configured properly, or the system might be unstable and operate within Undefined Behavior (UB) conditions. For some, these UB might lead to the observable spurious cold/warm resets. Others might be privilege escalation gadgets.
Here's the dilemma you may have noticed: How can we assert that the BIOS (and the entire SMM Runtime) are not part of the Trusted Computing Base (TCB) of CCT, while simultaneously stating the BIOS must correctly configure the hardware to prevent undefined privilege-escalation issues? There must be a mechanism to validate or ensure the accuracy of this configuration, and it should be part of the TCB of Confidential Computing.
From the perspective of user-visible software within the TCB, these components include, inter alia and Authenticated Code Modules (ACMs)
ACMs historically play a vital role in Intel’s Security Architecture. It’s no different with respect to Intel TDX and Intel SGX: see the Secure Arbitration Mode (SEAM) Loader for instance. Assume now you want to fuzz/reverse engineer the ACM on a real platform. Ideally, you would try to break the Core execution at the GETSEC instruction boundary. If you are not time constrained, you can modify the legitimate BIOS, recompile (with infinite loop or other approach), reflash and run.
But that does not scale well. Instead, you might want to hijack the Instruction Pointer (either EIP or a true Register Instruction Pointer) on the instruction that brings to life a given ACM. How do you do this without BIOS recompilation? Meet the Probe Mode Redirection.
Probe Mode is another CPU mode (next to Real, Protected, Virtual 8086, SMM, and IA32e), but it’s not well described officially. This is a debug mode of virtually every single Intel CPU since P6 microarchitecture. In this mode, the microcode waits for the macroinstructions issued via dedicated JTAG commands. To enter Probe Mode, you unlock the CPU first. Once unlocked, so called Probe Mode Redirection events can be set up that freeze frontends from fetching following instructions and redirect the Core to Probe Mode. GETSEC is one of those events and thus it is relatively easy to set up a breakpoint on that instruction without knowing its location. Having access to Probe Mode you simply set up a dedicated event and wait for Probe Mode Redirection while the Bootstrap Processor (BSP) boots the platform.
But the ACMs might not be the ones you were looking for. You might want to target WRMSR macroinstruction writing to a specific MSR: 0x79, as this is the microcode update trigger point. Some microcode updates might establish or patch the XuCode (which is part of the microcode runtime forming the Intel SGX ISA). In recent CPUs, Intel established a new software component, executed as part of the microcode update and responsible for the verification of hardware resource configuration. This component safeguards platform configuration and enables CC technologies. Treating this as a black box is one scenario. As stated, you might want to establish a research framework at the wrmsr 0x79 boundary. However, for that we do not have an event to redirect the Core to Probe Mode. This means you need to find specific wrmsr instructions on your own.
That instruction, along with the configuration of the hardware/firmware/software resources, became a major target for conducting security research around CC technologies. This is why we want to break code execution just before we execute it.
On the other hand, the EDK2, which is a “BIOS” implementation under UEFI Specification constraints, is a mature project written in C language and can be hooked up with third-party drivers, platform code, and so on. It inherits virtually all problems of this language. Buffer overruns are classics. Manual and automated procedures can be employed to find these. Race conditions, TOCTOUs are nontrivial and very specific. How do we find these at runtime on a real platform? Could we provide end users with a tool that would allow them to conduct their own research? Something that would allow them to find the location of a specific x86 instruction, and on the other hand, detect abuses of the EDK2 API without even knowing a single address while executing on a real platform?
Addressing challenges at the “microscopic” level (single instruction execution) and tracing/modifying C-specific constructs were crucial milestones set at varying levels of complexity that underpinned the goals of Intel.BIN.
Granularity
Intel.BIN is a C++ enabled framework that works along with BIOS as it executes its macroinstructions by the so-called Bootstrap Processor (BSP). It might be used to create dynamic code analysis tools, called Analyzers here. End users may define three types of Analyzers: Instruction, Function and Address. More on the “under the hood” execution mechanism is described in the Engine section.
Instruction Analyzers are executed at the instruction boundary. These are “the most fine-grained” analyzers and might be used to detect any x86 macroinstruction (of the BIOS) at any chosen context. In the following example, a typical C++ construct might be observed where a class inherits from an interface definition. Here, we override the Execute() function which will be called at every instruction boundary. The conditional statement detects whether the next instruction’s mnemonic is wrmsr and that the RCX GPR holds 0x79 value.
Function Analyzers are executed at CALL and RET instruction boundaries. This is where a custom logic around functions might get executed. The following is a classic example counting the number of function calls. An additional function of the Analyzer interfaces is shown: PrintSummary(). At the end of Intel.BIN execution, all such summaries of all registered analyzers are executed.
Function analyzers can be employed to identify the locations of functions by examining their arguments and/or return values. For example, we may seek to locate the GetVariable() function within the BIOS binary during runtime.
Function Frames
A CALL instruction saves procedure linking information and branches to the called procedure specified via operand. Returning from a procedure to the caller is typically done by RET. The procedure linking information is saved into a construct called a Function Frame. Every Function Frame has Arguments, a return value, ID, addresses of top and bottom of the stack (reserved by function prolog) and an address (within the stack) of a return address. The combination of these parameters allows the creation of analyzers characterized by a wide range of capabilities. Having an address of return address (on the stack) one can detect any read or write instruction targeting this address. On the other hand, end users may wish to detect an address of function based on input and output parameters.
Detecting EDK2 Get and Set Variable
On the opposite side of the complexity, when compared to a simple detection of the next instruction, more complex and chained C constructs can be detected too. Let’s go through the process of detecting Get/Set EFI Variable functions, those exposed to the BIOS developers by EDK2 API.
Starting with the signatures of Get/Set Variable functions:
Both of them have an argument list comprised of five entries. VariableName and VendorGuid are commonly shared. The following three are combinations of IN and OUT parameters. VariableName and VendorGuid are pointers to the memory. And thus, the initial pass filter might be created based on this observation: those addresses must be mapped and at least readable according to access rights provided by Paging structures (and Control Registers). VariableName then, according to the function signature, is a wide-c-string. Typically, ASCII printable byte followed by 0x00.
The following is part of Function Analyzer implementing the initial pass filter for the arguments. The UefiVarFunctionFrame type extends the FunctionFrame, providing a restricted view of argument types as specified by the signature of the Get/Set Variable functions.
Note the referenceFrame object. As the name suggests, it is a reference frame we try to compare with. It holds a name filled which indicates VariableName.
As a next step we might want to distinguish between Set and Get functions based on three other arguments (Attribute, Data and DataSize).
The Set function is fairly easy to detect, especially with a reference Frame passed through a sample code (with known data). Get Variable must be detected at the Epilogue phase, as there are IN/OUT parameters and constrained return value. The below Epilog hook function is executed within UEFI Function Analyzer context and detects GetVariable function address (see uframe->functionAdresses Map).
Function Frames, considered as relevant in the filtering process, hold all necessary data to start the Engine in so-called Address Mode. I will write more on the Intel.BIN Engine modes later in this blog. Once there, the Address Analyzers may execute custom logic at the chosen address boundary. In our case, we may set code fetch breakpoints on each detected function prologue and epilogue. Intel.BIN employs DRx registers from the hardware debug subsystem to provide breakpoint support. On each GetVariable() entry point we may analyze and store Function Frame. On the exit point we can discover Buffer Too Small error code returned by the function. In this scenario, if SetVariable() uses the same stack-based buffer and DataSize exceeds the function's stack boundaries, it indicates a breach of confidentiality. This occurs because SetVariable will read beyond the intended data from the stack and store it in the UEFI Variable Store.
We have seen these types of issues through the Intel Bug Bounty program. Those were not discovered by static code analyzers.
Examine the following code on your own.
QEMU Screenshot
The screenshot of the proof of concept (PoC) is displayed below. Here, I have two EFI applications: VulnEfiVar and IntelBinDemo. The VulnEfiVar app demonstrates the issue described earlier, while IntelBinDemo installs Intel.BIN into memory and configures and initializes all necessary analyzers to detect vulnerability at runtime. In the first step, as noted in the UEFI Shell, we execute VulnEfiVar, which triggers the issue and exits quietly. See Figure 8.
In step two, Intel.BIN is executed in Step Mode initially; once it identifies all required addresses, it switches to Address Mode and remains in that state even after the EFI app completes its execution. The left panel in Figure 8 displays serial port output logs. As the EFI shell driver ends execution of IntelBinDemo.efi, we can see the BIOS code already touches two variables.
In step three, we simply re-execute the vulnerable app. This time, however, as Intel.BIN detects the issue, the execution of VulnEfiVar is interrupted. Logs from this event are collected via a serial interface. This step is shown in Figure 9.
MAA: Monitoring Any Address
The Function Frame class has a field holding the stack address pointing to the return address of a function which was pushed by CALL instruction. This address (the address pointing to the stack) should never be touched by any code other than RET. Otherwise we can assume buffer overruns scenarios (over-reads / over-writes). Intel.BIN can be configured to monitor and detect these conditions at runtime. The implementation of this feature is fairly simple.
The picture above shows typical Paging structure observed at EDK2 DXE phase. Page Directory Entry (PDE) points directly to the 2MB Page Frame. The subset of this area may be reserved for the EDK2 procedures stack space. By clearing a Present bit in that particular PDE one can “Page Out” the Page Frame. Each access to this Page Frame, read, write, fetch, triggers a Page Fault Exception (#PF). CPU pushes #PF error code indicating the #PF condition and the CR2 holds the offending address. If that address is not the one we aimed to monitor, then the #PF handler must perform so-called “Page In” by asserting Present bit in faulty address’ PDE and resubmit the instruction to the Core’s Frontend. There is no need to readjust RIP as the exception is of Fault type (i.e., RIP points directly to the instruction causing a fault). However, when CR2 contains an address that references a 'RET Value' location, this suggests the presence of a buffer overflow or similar spatial memory corruption (possibly temporal as well). The error code pushed on the #PF handler stack will give you more information on the error source.
Intel.BIN Engine
A key goal of Intel.BIN was to ensure that the BIOS code remained independent. This means that EDK2 is not reliant on Intel.BIN, and vice versa. How was it achieved? Let’s start with interrupts.
In x86 this is typically what happens upon interrupt/exception delivery: either core generates exception or Local APIC or external Local APIC or external line does it. In any case, the signal is routed through Interrupt Descriptor Table defined by IDTR System Table Register. It has a base and limit part and defines how the signal is handled. IDT, in ia32e mode, holds 16byte wide descriptors. These define the address of handler (indirectly through either GDT or LDT) and other flags relevant to interrupt handling process. Assume the descriptor drives an interrupt signal trough GDT in IA32-e 64bit. It must “visit” a Paging mechanism too. If configured properly, such an exception would result in a code fetch from a well-defined physical address: a handler’s entry point.
Intel.BIN intercepts the code execution through a direct modification of IDTR.BASE. That portion of IDTR defines the base address of IDT. If one copies the entire IDT to a new place and repoints IDTR.BASE to that new location, the interrupts/exceptions are handled as previously. This allows for the redefinition of handlers without altering the BIOS data or code.
By default, Intel.BIN modifies just one entry in this newly rebased IDT, namely – the second entry (Vector 1) - #DB.
#DB exception has a few sources: INT1, x2APIC (LVT/IPI), next instruction modifies DRx, I/O traps, Task Switch, EFLAGS.TF and others. The TF in EFLAGS is a “Trap Flag”. Once asserted, the core issues a #DB exception at instruction boundary allowing us to perform so-called Single Step.
And this is how Intel.BIN hooks into BIOS w/o modification of a single line of BIOS code. Its engine may, from then onwards, analyze the BIOS at the instruction boundaries (via Instruction Analyzers and eventually the Function Analyzers).
The picture above shows conceptually how Intel.BIN installs itself into the system. IDTR.Base is relocated below the memory reserved for the BIOS and, by default, #DB descriptor is modified to repoint the handler location.
Engine Modes
Intel.BIN has two operational modes: Single Step and Address Mode. They differ in the frequency of interrupts execution.
Assuming the Intel.BIN Function Analyzers have identified the desired function address in Single Step Mode. From then onwards you can switch the Engine into Address Mode to monitor the usage of that address. This change results in a notable performance improvement because the core no longer traps at every instruction. In contrast, Instruction and Function Analyzers utilize Single Step Mode, which imposes a significant computational burden on the system by trapping every x86 macroinstruction.
Runtime Dependencies and others
As Intel.BIN executes in a pre-OS environment it must not have any OS runtime dependencies. Any syscall ends up with immediate fatality, so to speak. C++ relies heavily on dynamic memory allocation, and malloc implementations frequently execute SYSCALL instructions under various conditions. This is why Intel.BIN has its own minimalistic allocator.
We won’t find libc Dynamic Shared Object (DSO) loaded anywhere nor libstdc++ and nor the ELF bootstrap code. The installer of Intel.BIN must take care of .BSS section and so on. Fortunately, it does not have to load link-dependencies as they are already statically linked to the ELF image. This way Intel.BIN achieves a bare-metal execution capability as it is self-contained code.
A noteworthy point regarding the x86 encoder/decoder: Intel.BIN utilizes Intel XED for this function. If you're interested in developing your own code that is aware of the x86 ISA, Intel XED could be an ideal choice, as it makes six distinct API calls to libc.
Current State of Art
Currently, Intel.BIN operates only in Driver Execution Phase (DXE) of the EDK2 based BIOS. As far as I am concerned, the CPU operates in IA32-e 64bit Long Mode in this phase. And this is the first limitation of the current Intel.BIN implementation as it operates only in this mode of CPU and that phase of UEFI.
Next, for those willing to analyze dynamically System Management Mode Runtime (SMM RT), I have bad news too (for now). SMM RT is not supported. It is on my wish list; however, a day has these many hours in the end.
Once the SMM RT support is implemented and the code is further “polished”, the Intel.BIN is anticipated to be released publicly, allowing for external contributions and usage.
Share Your Feedback
We want to hear from you. Send comments, questions, and feedback to the INT31 team.
About the Author
Przemyslaw Duda (0xdefacedbeef) is an Offensive Security Researcher in Intel's INT31 team. He likes to know how things work in the detail and how to make them better by breaking them first. He does not hesitate to share his knowledge with a broader audience inside and outside Intel walls.