Ben Gras, INT31
Introduction: The Challenge of Hardware Security Verification
Modern CPUs are incredibly complex, containing billions of transistors that must work together flawlessly while maintaining strict security boundaries. When these boundaries fail, the consequences can be severe—as demonstrated by high-profile vulnerabilities like Meltdown (CVE-2017-5754) and Microarchitectural Fill Buffer Data Sampling (MFBDS, CVE-2018-12130). These vulnerabilities allowed attackers to access sensitive data that should have remained isolated, affecting processors across the industry.
The core challenge is this: How can we detect when sensitive data inappropriately flows from secure areas to locations where attackers might access it? In hardware security verification, we need to answer a deceptively simple question: given a particular value in any hardware buffer, where does that value go and which other values does it influence? If we can precisely track this dataflow, we can identify potential security vulnerabilities before they reach production.
Our Solution: IODyne - Precise Dataflow Tracking
We developed IODyne (named after iodine dye used to trace liquid flows) to solve this problem and introduce some benefits over the state of the art (X Injection, discussed later). IODyne uses differential analysis. The primary tool for finding vulnerabilities is to simulate all of the billions of transistors that make up a CPU using a process called pre-silicon simulation. The differential analysis then works like this:
- Run the simulation with normal data values (baseline)
- Run again with modified data at the tracking point
- Compare results—any differences reveal true dataflow dependencies
This approach eliminates false positives caused by overly conservative simulation semantics, providing precise tracking of actual data influence.
Key Benefits and Impact
Our results show that we not only have improved precision, but also IODyne offers automatic case minimization. When a potential security issue is detected, IODyne can automatically reduce complex test cases from hundreds of data injections down to the single injection needed to reproduce the issue, significantly reducing human diagnosis time.
In this post, we'll explore how dataflow tracking works, demonstrate IODyne's precision advantages using concrete examples, and show how this technique helps detect vulnerabilities like Meltdown and MFBDS before they can impact users.
Background
The design of chips that are eventually fabricated is typically described using a Hardware Description Language (HDL). These languages allow engineers to describe the structure and behavior of digital circuits. A commonly used HDL is System Verilog (also informally referred to as Verilog in this post).
To illustrate dataflow tracking, we'll use a simple example: a Linear Feedback Shift Register (LFSR). This circuit shifts bits through a register while feeding back XOR'd values from 'tap' points. This creates a sequence we can track through the design. To see it in action, see this visualization:
Note: We use the LFSR as a simple teaching example to illustrate dataflow tracking concepts. In later sections, we'll apply these same principles to detect real CPU security vulnerabilities like MFBDS.
Let's implement our LFSR in Verilog. The code below shows a simple LFSR implementation. Even if you're unfamiliar with Verilog syntax, we'll explain the key concepts relevant to dataflow tracking.
The above Verilog design shows a hardware-level module that has as input a reset signal and a clock signal. The Verilog code describes a digital circuit, such that it could be synthesized in order to implement it into hardware (ASIC or FPGA). What in everyday programming languages might be called variables, are in Verilog known as registers (declared with 'reg'). Registers can have a variable width. Each element is either a 0 or a 1, corresponding to different voltage levels when the design is eventually synthesized and baked into a chip. During simulation, however, we allow special values that do not correspond to any voltage level. In this post, one such special value will be important: the X. It represents not a concrete value, but an unknown value. As such, its semantics are very different from the values 0 or 1. As we shall see, we will be able to use these semantics for dataflow tracking.
Special Verilog Signal Values
Verilog signals are typically 0 or 1, but there's a special 'X' value meaning "unknown."
Consider the LFSR design. The circuit updates on each clock edge (when reset is inactive), shifting the register and computing new feedback values.
The first key operation is the computation of the feedback bit:
This statement continually makes the signal 'feedback' contain the XOR of the 4 tap positions.
An LFSR shifts bits through registers while feeding back XOR'd values—creating a predictable sequence we can track. The key point is that changing one input bit affects multiple output bits over time. On every rising edge, the feedback bit is inserted at the least significant position. This creates the shift-and-feedback behavior that defines an LFSR.
Now that we understand how hardware circuits are described in Verilog, we can explore how to track data as it flows through these circuits. The fundamental challenge is: how can we follow a particular data value and all the values it influences throughout a simulation? This is where dataflow tracking comes in.
Dataflow Tracking: Current Approach
To illustrate the current approach of X injection, let's ask ourselves: how can we follow a particular data value (at a particular instance), and all values that are ultimately influenced by a particular data value (at a particular instance), throughout such a simulation? To visualize what we want to do, let's look at the output of our LFSR, above.
The LFSR output is:
To see the dataflow tracking capabilities of X Injection in action, consider our LFSR. At the start, we assign q[0] an x instead of its usual value. In other words, we force a bit to contain the value X where it otherwise would not - hence the name X Injection. q[0] is the bit we want to track. As the LFSR continues to operate, the X value is shifted and used in new computations. Each new signal influenced by the tracked bit becomes X itself.
How X is Used to Follow the Spread of Data
X Injection works by replacing a target bit with 'X' and watching it spread. Any operation involving X produces X, so we can track which signals become "infected" by our injected unknown value. This shows us the dataflow path through the circuit.
While our LFSR example demonstrates the concept clearly, the real power of dataflow tracking becomes apparent when applied to security vulnerabilities. Let's return to the CPU vulnerabilities we mentioned earlier and see how X injection helps detect them.
Relationship to Meltdown, MFBDS, and other CPU Vulnerabilities
X injection is suitable to find vulnerabilities such as Meltdown and Microarchitectural Fill Buffer Data Sampling (MFBDS). To review how MFBDS works we first review what a Fill Buffer is, and then discuss how the contents of this fill buffer both is security sensitive (might be owned by a different security domain, such as the previously executing process), and in the presence of MFBDS, can be leaked to the code currently executing on the CPU:
From Intel Publication on Microarchitectural Data Sampling
A fill buffer is an internal structure used to gather data on a first level data cache miss. When a memory request misses the L1 data cache, the processor allocates a fill buffer to manage the request for the data cache line. The fill buffer also temporarily manages data that is returned or sent in response to a memory, I/O, or special register operation.
[..]
Fill buffers may retain stale data from prior memory requests until a new memory request overwrites the fill buffer. Under certain conditions, the fill buffer may speculatively forward data, including stale data, to a load operation that will cause a fault/assist. This does not result in incorrect program execution because faulting/assisting loads never retire and therefore do not modify the architectural state. However, a disclosure gadget may be able to infer the data in the forwarded fill buffer entry through a side channel timing analysis.
To illustrate this, let's look at the path that data can take when a Fill Buffer is used to service a load that misses the cache. We will use a hypothetical CPU design, not necessarily exactly the same as any Intel product, but just using the same concepts to illustrate the behavior we want to detect.
It takes the path from where the data is (DRAM), down through the caches, through the fill buffer that remembers the data, and finally into the register into which the load was supposed to go. We frequently refer to registers as Physical Register File (PRF), because that's where the contents of a register is stored.
The MFBDS vulnerability comes into play when, in a new process (the attacker), the load faults (due to an unmapped page), but the address of the data that is requested is still in the LFB. Under some conditions, this data can then be forwarded as if the address did truly exist in the attacker's process. This load will still never be truly visible to the program, because the faulting load will never retire. But if these instructions are on a speculatively executed instruction path, the result of the load (from the LFB) can nevertheless be briefly available in the PRF of the attacker, and thus be disclosed using a disclosure gadget.
To summarize, MFBDS exploits stale data in CPU fill buffers. These buffers temporarily hold cache data but may retain old values from previous processes. The vulnerability occurs when this stale data gets speculatively forwarded to current operations, potentially leaking sensitive information.
If we were to try and detect this vulnerability using X injection, we might:
- Inject Xes at the moment that a Fill Buffer is deallocated (i.e., the data becoming stale).
- In dataflow tracking terminology, we call this injection point a source: the starting point where we begin tracking potentially sensitive data. In this case, our source is the Fill Buffer at the moment it becomes deallocated, representing stale data from a previous security domain.
- The X values now represent data that should never be available in any way (including speculatively) to the running code, because that data has been deallocated and could easily belong to a different security domain.
- We then execute a program that attempts, perhaps through loads that abort due to a fault, but also through many other mechanisms that are automatically generated, to access the data in this Fill Buffer that is deallocated.
- If we see that, through X propagation, X values appear in part of the microarchitecture where they could be made visible by a covert channel (a disclosure gadget), we see this as a potential security problem. An example might be the Physical Register File (PRF) registers. This means that the executing code is likely able to disclose the contents of the Fill Buffer, as the program has control over data in its register file.
- We call these monitored locations sinks: points in the microarchitecture where data becomes (indirectly) visible to executing code. For example, the Physical Register File (PRF) is a sink because programs can (indirectly) observe register values. If X values flow from our source (deallocated Fill Buffer) to a sink (PRF), we've detected a potential information leak.
Recap
To recap, the underlying assumption for Xes appearing in a sink is this: data from a source found its way into a sink. For our purposes, sources are data that should not be accessible to the executing program. Sinks are places that are accessible to the executing program.
Xes appearing in sinks means: the executing program has gained access to data that it should not. In this case: data has found its way from a deallocated Fill Buffer to a PRF (i.e. a register). A human must analyze where in the path the spread of data should have been stopped so a clean fix can be introduced.
Processes similar to the one described above have been used at Intel at a wide scale to pre-emptively find vulnerabilities before they could appear in products. X injection has proven valuable for catching security vulnerabilities, but like any technique, it has limitations.
X Injection Drawbacks
X injection has pros and cons. Its strong points are
- Native support in industry standard Verilog simulators,
- Scalable.
Among its drawbacks is its lack of precision, meaning the X can be present in more bits than is truly influenced by the original dataflow tracking source. There are multiple reasons for this, but the most important ones are
- X values in Verilog don’t always reflect true dataflow, because Verilog semantics are pessimistic but not history-aware.
For example, consider starting with the value 0, adding an X (i.e., an unknown bit), and then subtracting the same X again. Mathematically, this is a no-op: the X cancels out, and the result should be 0. However, in simulation, the result will still be X because Verilog doesn't track the history of operations — it only sees that an X was involved.
This illustrates that once Xes are introduced, they often "stick," even when the operation is logically reversible. Verilog’s semantics make this a safe (conservative) choice for hardware modeling, but an imprecise one for dataflow analysis.
- Some operations in Verilog over-approximate where X values should propagate. A clear example is addition. Suppose we add two 4-bit values, and only the most significant bit of one operand is unknown (X). Verilog will mark the entire result as X, even though only a few bits could possibly be influenced by the unknown input.
X010 + 0010 = xxxx
This behavior is defined by the Verilog standard and is technically correct — but from the perspective of dataflow tracking, it’s misleading. The X has spread too far. Ideally, we want to track only those result bits that are actually data-dependent on the unknown input. In this case, Verilog’s semantics are too pessimistic to serve as precise dataflow tracking.
As a result, diagnosing the root cause of a violation flagged by an X injection simulation can be a huge amount of work and sometimes be a false positive. These limitations led us to ask: can we be more precise?
The answer is yes. We developed a new approach that maintains the scalability of X injection while dramatically improving precision.
IODyne: New Approach
As explained in the introduction, IODyne achieves precision through differential analysis.
The key insight is: We want to track influence, i.e. differences caused by data values changing. Instead of tracking and propagating uncertainty (X), we track actual data differences, eliminating false positives from Verilog's pessimistic semantics. Let's see this method in action using our familiar LFSR example to illustrate the difference in precision.
We change the LSB and track how this difference causes differences downstream of the LFSR operation. In animation:
Figure 6: Tracking how changes in the LSB causes differences downstream of the LFSR operation.
The same logic shown in time:
Comparing that to the dataflow with X injection, we see that the X values over-estimate the dataflow: sometimes, even when there is no influence (outcome is the same), we still see x's.
The LFSR example shows the concept, but how much more precise is IODyne in practice on real CPU designs? Let's quantify the improvement.
IODyne: Precision Results
To quantify IODyne's precision advantage, we conducted experiments on a complex CPU design. The scatter plot below compares two measurements at each simulation timestep:
X-axis: Number of signals affected according to IODyne (differential analysis)
Y-axis: Number of signals affected according to X injection
Each dot represents one moment in time. Significantly, dots above the red diagonal line indicate moments where X injection reports more affected signals than IODyne. This is evidence of false positives due to overly conservative propagation.
At each moment in the simulation we compare the number of affected signals with the precise method (X axis) to the number of affected signals with X injection (Y axis), and plot a dot there. Every dot above the diagonal means a moment where X is spreading too far, giving the potential for false positives.
IODyne dramatically reduces false positives. In our CPU design experiments, X injection consistently flagged more signals as affected than IODyne's precise tracking. The scatterplot shows most data points above the diagonal. This is evidence that X injection over-reports dataflow spread.
We call the cases where a dataflow tracking system reports more affected signals than the true number of affected signals over-tainting. The reverse (under-reportng affected signals) is called under-tainting.
IODyne has an important limitation: it can suffer from under-tainting in data-dependent scenarios. For example, if influence only occurs when specific data values create particular conditions (like a carry in addition), but these conditions don't occur in our specific simulation run, IODyne won't detect the potential dataflow path. X injection, with its conservative semantics, will detect these possibilities. For security verification focused on stale data leakage (our primary use case), where data is mostly copied, this trade-off is acceptable.
A chart of the results:
The above scatterplot clearly demonstrates this precision advantage. Most data points lie above the diagonal, showing X injection consistently over-reports affected signals
IODyne: Case Minimization Benefits
In addition to improved precision, the IODyne technique can save humans time to diagnose vulnerabilities found this way. To explain why and how, let's first remember that in any simulation, we might be tracking many pieces of data at a time. Usually this comes from tracking conditions for the same signal being met at many different times in a simulation. Each tracking is accomplished through a data value injection.
When dataflow has been detected from a source to a sink, a vulnerability has been detected. How a human proceeds with diagnosing this case is, generally speaking:
- Trace back: using tools and knowledge of the RTL design, trace the flow of data back from the sink (where it was detected) to the source (the injection that started the dataflow).
- Trace forward: starting from the injection that started the dataflow, follow the flow forwards and find the part in the design that most naturally would stop the flow from reaching the source.
Here, full visibility of IODyne can help the human a great deal with step 1. In a particular execution, IODyne has detected that a sink has become affected by one of many injections. IODyne can now automatically reduce this test case by reducing the number of injections it does. Let's say we take the original set of injections that reproduced the case, and run 4 new simulations, each with a subset of the original set of injections. We assume that at least one of those simulations will still reproduce the vulnerability (which we can test by looking at whether the sink is affected by the injections that it does). We then continue the process with the reduced set of injections, to try and reduce the set further. When we reach 1 injection, we try to reduce the length of that injection. We stop when none of the simulations reproduce the vulnerability (multiple were required) or we arrive at a single injection that is so short we can't reduce it further. This process has the benefit to the human that step 1 has been automated away: we have found the single injection needed to reproduce the vulnerability, and the forward trace can start. Thanks to full visibility, a light is shone on the possible paths to take in the forward direction.
As an example, here is a test case that was diagnosed internally, where a vulnerability was shown to exist with 554 injections. Each row in the table represents a set of simulations trying to reduce the set of injections while still reproducing the vulnerability. The columns mean the following:
- Bisect segment hits: which simulation of the 4 that we try is a 'hit'? The simulation that reproduces the vulnerability is called a 'hit,' and as we expect, each row shows 1 hit: this is 1 simulation that reproduces the vulnerability.
- Unique Tainted: The number of signals (after throwing out duplicates) that are affected by the set of injections that the 'hit' simulation does.
- Injections in hit: the number of injections done in the 'hit' simulation.
- Injections time range: in the simulation time, which time range does the 'hit' simulation do injections in?
- Error indicators in Hit: the Error Indicator is the affected signal that indicates the presence of a vulnerability (the PRF in the example in X injection). The Error Indicators tell us how many times this signal is activated.
As we see in the above table, the number of injections is ultimately reduced from 554 to 1, and the number of affected signals is reduced from 2196 to 649. This greatly reduces the burden on a human diagnosing the reason for this flow, as the forward trace can start right away, and the number of signals to choose from when following the trace is reduced greatly. To illustrate the difference, the following is a visualization of the number of affected signals in the un-minimized simulation, with 554 injections (red), and the number of affected signals in the minimized simulation (1 injection, blue).
The visualization clearly shows the dramatic reduction in complexity. The red trace (all 554 injections) shows tens of thousands of affected signals with large spikes—a complex debugging landscape. The blue trace (single minimized injection) shows a baseline of approximately 600 affected signals with minimal variation. This is a much more manageable debugging task.
Other Use Cases
The cases sketched above are cases where dataflow tracking is used to find the consumption and possible exposure of stale data. There are, however, other use cases where IODyne can be used to find vulnerabilities in designs. Some examples:
- Secret tracking. A particular secret, that should never be exposed to executing code, e.g. an encryption key, is used. For this use perhaps copies have to be made. When the secret is no longer used, the copies should be erased from the design. IODyne could be used to track the flow of the secret, and detect cases where it might still remain in the system after use. Data that is influenced by this key (e.g. encrypted data) would also show up as affected signals, so this is something where some higher level analysis would be required to distinguish safe from unsafe cases.
- Reset fidelity. After a reset happens, are there any signals in the design that have non-reset values?
- Validate Spectre mitigations. Execution in one security domain should have limited effect on branch prediction in another security domain. IODyne could detect tainting from a previous security domain in the current security domain.
These use cases have currently not been tried with IODyne inside Intel but will likely be tried in the future.
Conclusion
Dataflow tracking is essential for finding hardware security vulnerabilities before they reach products. IODyne addresses limitations of (the wildly successful) X Injection through differential analysis, providing precise dataflow information while maintaining scalability. The automatic case minimization feature further reduces human effort by isolating the minimal conditions needed to reproduce vulnerabilities. With ongoing POC's across Intel product teams and promising results on real CPU designs, IODyne represents a significant step forward in hardware security verification. Future work will explore additional applications like secret tracking and Spectre mitigation validation.
Acknowledgements
This work was developed in collaboration with and supported by many other experts, inside and outside Intel. Much input and collaboration was received from seasoned experts in current techniques, specifically X Injection, including Ashwini Gopinath, Ohad Neubach, Oumar Diallo, and Salessawi Yitbarek. The basis for IODyne was laid by a different approach, namely RTL instrumentation, named CellIFT, whose motivation was developed and refined in collaboration with the ETH Zurich research group COMSEC, specifically Dr. Flavien Solt, and Prof. Kaveh Razavi. This work was jointly published in [1] . Daniël Trujillo worked on the initial versions of IODyne while at an internship at Intel. He and the author of this post are listed as inventors for the patent in [2] .
References
- CellIFT Paper: CellIFT: Leveraging Cells for Scalable and Precise Dynamic Information Flow Tracking in RTL. Flavien Solt, ETH Zurich; Ben Gras, Intel Corporation; Kaveh Razavi, ETH Zurich
- Intel RTL Information Flow Tracking Patent: Detecting potential security issues in a hardware design using register transfer level (rtl) information flow tracking
Share Your Feedback
We want to hear from you. Send comments, questions, and feedback to the INT31 team.
About the Author
Ben Gras is an Offensive Security Researcher in Intel's INT31 team. His primary research topic is helping humans find and diagnose hardware vulnerabilities painlessly, by spending computer time. The author received a PhD in systems security from the VU Amsterdam.