Brian Delgado, INT31 Offensive Security Researcher
Introduction
In recent years, Intel® Xeon® processors have incorporated new capabilities that offload time-consuming or costly portions of applications to accelerator logic to achieve performance gains or reduce overhead. 4th Generation Intel® Xeon® Scalable Processors brought four accelerators into the CPU package: the Data Streaming Accelerator (DSA) focused on reducing CPU overheads for common data movement operations; a related In-Memory Analytics accelerator (IAA) focused on acceleration of analytics operations including those done on databases; QuickAssist Technology (QAT) enabling the offload of supported cryptography and compression operations; and Dynamic Load Balancing (DLB) enabling a hardware-based load balancer.
Performing security and functional testing of accelerators requires a multi-faceted approach including the use of fuzzers to send potentially malformed inputs to the accelerator to detect unexpected faults or security issues. Fuzzers are an automated method of generating inputs that can expose a variety of issues in a given implementation. Guided fuzzing has become widely used across user-space and kernel software and represents an evolved approach in which the target’s responses guide future fuzz inputs. The guided fuzzer observes which inputs resulted in novel responses and trims out uninteresting inputs to make better use of testing time. Without guided fuzzing, the fuzzer operates randomly and often ends up wasting time on testing uninteresting inputs.
Bringing guided fuzzing to hardware has challenges. Traditional software fuzzing can leverage a compiler pass over the target to instrument conditional branches and function entry points. Fuzzing can get insights into what inputs are novel based on their coverage of this instrumentation. However, this type of instrumentation is not feasible on hardware logic which prevents the fuzzer from achieving the same level of target observability on hardware. This lack of observability prevents the fuzzer’s ability to apply guided fuzzing which reduces effectiveness. Thus, hardware fuzzing needs a different method.
This article introduces one approach for applying guided fuzzing concepts to live hardware targets, using Intel’s DSA accelerator as an example. It will introduce basic DSA concepts, describe the method based on an open-source fuzzing tool, and compare coverage results to a traditional fuzzing approach. The concepts introduced here can be readily leveraged in a variety of other hardware fuzzing targets.
Data Streaming Accelerator (DSA) Background
The DSA accelerator allows the user to offload a variety of data movement instructions including memory copies, memory compares, memory fills, CRC generation, among other operations. The operations can be specified in a DSA descriptor sent from user or kernel mode. The descriptor contains the opcode type, the relevant source and destination memory addresses, and flags to guide the operation and response.
Using a memory move (memcpy) offload as an example, Figure 1 shows the set of fields available for the user to customize the operation. The reserved fields should be set to 0 for the operation to succeed. The Flags can be set for a completion record to be generated by the accelerator and the address of the completion record structure in memory provided in “Completion Record Address”. The user would then fill in the source and destination addresses and the “Transfer Size” (copy size) as desired.
The user can send the descriptor to a shared DSA work queue via a CPU instruction, e.g. ENQCMD. DSA hardware will receive the descriptor, perform the specified offload operation, and return details about the results in a “completion record” that includes an operation status. The user can poll the completion record operation status to determine when the operation was complete.
In attempting the offload operation, a variety of possibilities could occur. The operation could succeed or it could fail in various ways depending on the descriptor settings. Common completion statuses include: success, read/write page fault, unsupported opcode, invalid transfer size, non-zero reserved field, invalid flags, among a number of other possibilities. Figure 2 shows an example completion record that returns status information on the operation. The user code can check the completion Status field to determine if the offload was a success or had an error.
Guided Fuzzing Tool: IJON
The AFL fuzzing tool (and the AFL++ update) have become popular in the security industry as an effective way to explore software targets. The approach leverages insights from compiler-driven target instrumentation to accomplish guided fuzzing. In 2020, Aschermann, Schumilo et al. from the Ruhr University Bochum released an innovative extension to AFL called “IJON” that enabled users to insert “annotations” into their fuzz harnesses. These annotations convey additional dynamic information to guide fuzzing progress beyond what was possible to observe from target instrumentation. When IJON receives new values via these annotations, they are considered novel and worthy of further exploration.
IJON features a variety of annotation macros. One such annotation is IJON_SET(x). This allows values for x to be observed by the fuzzer and new values are treated like traditional AFL path discoveries to be retained for further permutations. This annotation allows guided fuzzing via examination of the target feedback and avoids the practical concern of how to perform a traditional fuzzer compiler pass over targets where compiler instrumentation is not feasible.
IJON-based Method
Using IJON for DSA requires the creation of a harness application that receives data from the fuzzer and sends it to DSA hardware. This harness application also contains several control knobs that the fuzzer can manipulate to evaluate different testing scenarios. Each control knob can either be configured to use a valid value for a parameter or a fuzzed value. For example, a source memory buffer control knob chooses between a valid or a fuzzed source memory buffer. A destination memory buffer control knob allows the fuzzer to use a valid destination memory buffer or apply a fuzzed value. By allowing the fuzzer to compose these knobs in different ways and apply fuzzed data where directed, a broad set of tests can be achieved. Once the control knobs are set and the appropriate descriptor fields set with input data, the harness sends the descriptor to hardware and waits for a completion response. Once the response is obtained, the harness invokes IJON_SET(completion_status) to convey fuzzer feedback to IJON.
To show the flow, Figure 3 begins with Step 1 where IJON produces its fuzzed input which the DSA fuzz harness receives. The harness applies the fuzzed input to the various control knobs and descriptor content. Then in Step 2, the harness issues the appropriate CPU instruction to send the data to a shared or dedicated work queue. The DSA hardware receives and processes the descriptor and sends the completion record back, in Step 3. The fuzz harness conveys back the received status to the fuzzer via invoking IJON_SET in Step 4. If the status field was novel, the fuzzer will likely do additional permutations and begin the flow again at Step 1.
Harness Creation
Leveraging IJON in a harness can be done via the following:
int main(int argc, char **argv)
{
// Get fuzzed data from file descriptor
…
// Copy fuzzed data to descriptor
…
// Send DSA descriptor and receive completion status
int completion_status = send_dsa_descriptor(desc);
IJON_SET(completion_status);
…
}
This example passes completion_status values to IJON and the fuzzer becomes aware of inputs that trigger new completion_status results. This allows a basic concept of coverage by noting which of the possible DSA hardware responses have been hit. For simplicity, assume a short fuzzing session resulted in two paths in the IJON GUI. Upon replaying the saved test cases, the following completion statuses were hit.
An inspection of this coarse-grained coverage indicates that the fuzzer didn’t retain inputs that set invalid flags or transfer sizes. The missing coverage opportunities suggest a few portions of the descriptor that should be included in fuzzing, for example: the flags field and transfer size field. It is also possible that letting the fuzzer run longer could uncover these scenarios too, if the fuzzer was enabled to operate upon them. The AFL fuzzer GUI metrics around path discovery can help guide the decision to proceed with fuzzing.
While this basic understanding of coverage gives a few basic hardware coverage insights, there is an opportunity for improvement: measuring the completion status coverage on a per-opcode basis. IJON features a simple hash function that can be used to hash the DSA opcode and the completion status. By exposing the resulting {opcode, completion status} hash to IJON, the fuzzer becomes aware when it hit a new return status per opcode. This has the useful ability to differentiate {Mem Move opcode, Success} from {CRC opcode, Success} providing a finer-grained understanding of the state space to the fuzzer.
Updating our fuzz harness example to incorporate this method:
{
// Copy fuzzed data to descriptor
…
// Send DSA descriptor and receive completion status
int completion_status = send_dsa_descriptor(desc);
// Get hash of opcode and completion_status
int completion_status_per_opcode =
ijon_hashint(desc.opcode, completion_status);
// Convey hash to IJON, new hashes are interesting
IJON_SET(completion_status_per_opcode);
}
Now, a subset of the coverage matrix could look like the following for two DSA opcodes:
Visualizations can also be auto-generated for evaluating coverage and reporting purposes. Figure 4 shows the completion status breakdown for the DSA Batch opcode that processes multiple descriptors together.
Some DSA opcodes provide additional feedback opportunities beyond the completion status, providing additional insights to the fuzzer. Therefore, there could be additional IJON_SET(x) invocations in the harness to capture these. Research into identifying additional useful feedback mechanisms has the potential to improve the results further.
Results
To evaluate IJON’s fuzzing effectiveness, its coverage can be compared to the standard AFL++ tool. For a closer comparison, the IJON tool used was upgraded to support AFL++. The metric of coverage compared is the total number of {opcode, completion status} combinations found by the tools. More opcode and completion statuses hit indicates a broader exercising of the accelerator logic and conditions. The coverage metric covers the completion statuses returned directly to the application and not those reported in the Linux system log via SWERROR.
Both AFL++ and IJON are able to observe coverage of the fuzz harness code itself, however, this is of limited value as code branches hit in the harness do not provide meaningful insights into hardware behavior. IJON goes beyond this limitation by consuming the DSA completion records as fuzzer feedback, providing a new data-driven feedback vector.
The test measures this coverage in 1,2, and 4 hour fuzzing sessions and expresses the cumulative coverage achieved by each fuzzing duration. The IJON results clearly demonstrate increased coverage over the standard AFL++ approach by using the DSA completion record as a feedback source. Figure 5 provides this comparison. Determining the total set of possibilities is not trivial and depends on harness configuration, system configuration, and opcode support on the system under test. We estimate that the AFL++ and IJON approaches achieve 64% and 88% of the reachable coverage possibilities, respectively.
Conclusion
New hardware capabilities present exciting new ways to accomplish work more efficiently. However, accordingly, security test methods need to continue improving. The IJON-based method helps increase fuzzer performance over traditional approaches. The coverage results show that by leveraging insights from the DSA completion record, the fuzzer was able to more effectively hit additional coverage points. The approach avoids the need for a fuzz compiler instrumentation pass over the target, which is not possible for a hardware target like DSA. IJON makes it straight-forward to leverage a data-guided fuzzing approach for this target.
With IJON, it is easy to pull in other useful sets of feedback that indicate that new target states have been hit via IJON_SET(x) and potentially hit higher rates of coverage. In selecting new feedback sources, it is important to select those that meaningfully tie the target response to the particular input sent. Noisy feedback sources complicate the work of IJON and give the false appearance of hitting a variety of new states, when this would reduce fuzz time spent more productively. When adding new feedback sources, it is helpful to monitor the AFL stability metric for drops to gauge whether the source is perhaps too noisy to be productive.
In considering the application of this method to new targets, a key consideration is the richness of target responses. Targets that expose a larger set of responses to inputs allow more opportunities for the fuzzer to drive testing to unique scenarios. Additional considerations include the speed of the target in processing the input as fuzzer exploration of extremely slow targets can be challenging.
What’s Next?
The INT31 OmniFuzz Project is researching novel forms of target feedback to allow fuzzing to increase coverage across a variety of live hardware targets. We plan to share additional technical write-ups on the methods and case studies of how the OmniFuzz project brings new methods to fuzz challenging hardware targets.
Acknowledgements
Thank you to the IJON creators (Cornelius Aschermann, Sergej Schumilo, Ali Abbasi, and Prof. Thorsten Holz) for the creation of a very useful fuzzing capability, with special acknowledgement to Sergej Schumilo for also porting IJON to AFL++. Appreciation to Intel Labs (Steffen Schulz, Matthias Schunter) for helping motivate research in this space, with particular appreciation to Steffen Schulz for encouraging focus on IJON. Thanks also to Jason Fung, Gayatri Behara, Neelima Krishnan, Philip Lantz, Lucas Van, Waruna Diyadawa Gamage, Rana Elnaggar, and Rushi Patel for project collaborations and insights. Thank you also to Dave Riss for supporting my early fuzzing research through Intel’s Excite project.
Share Your Feedback
We want to hear from you. Send comments, questions, and feedback to the INT31 team.
About the Author
Brian Delgado is an offensive security researcher in Intel’s INT31 team. He has worked extensively in fuzzing, UEFI firmware, System Management Mode security, Intel’s SMI Transfer Monitor, and virtualization before focusing more recently on Intel’s CPU-based accelerators. Before working in security, Brian worked in a number of areas in performance measurement and analysis. Brian holds a PhD in Computer Science from Portland State University where he focused on firmware-based runtime detection of OS/VMM malware.