Generate a Dynamically Discovered Control-Flow Graph with the Intel® Software Development Emulator

Pin Tool Kits

  • Download Pin kits here.

  • Before using Pin, please read the license.

author-image

By

Overview

A control-flow graph (CFG) is a fundamental structure used in computer science and engineering to describe and analyze the structure of an algorithm or program. A dynamically discovered control-flow graph (DCFG) is a specialized CFG that adds data from a specific program run. Intel provides a tool that generates a DCFG based on the Pin binary instrumentation package. We also provide an API to access the DCFG data from within another Pin tool or a stand-alone program.

DCFG Definition

Let's start with a classical control-flow graph (CFG) definition as described by Frances Allen in a 1970 ACM SIGPLAN article:
 

  • A control-flow graph is defined as a directed graph in which the nodes represent basic blocks and the edges represent control-flow paths.
  • A directed graph is connected if any node in the graph can be obtained (reached) from any other node.
  • A basic block is a linear sequence of program instructions having one entry point (the first instruction to run) and one exit point (the last instruction to run).

Typically, a CFG is defined statically. In other words, there is exactly one CFG for a given binary, it contains no information about the running path of any particular workload, and the nodes and edges are determined by all the possible reachable code paths in the binary. It does not typically include edges created by unexpected exceptions and other noncontrol-flow instructions.

We define a DCFG as a CFG with the following differences:
 

  • A DCFG contains a start node that has no predecessor nodes and whose successor node contains the first instruction run per thread. It also contains an end node that has no successor nodes and whose predecessor node contains the last instruction run per thread.
  • Each edge of a DCFG is augmented with a dynamic count to indicate how many times it was traversed per thread by a given running program. Except for the start and end nodes, the dynamic count of any node is equal to the sum of the counts of all of its incoming edges, which is also equal to the sum of all of its outgoing edges.
  • A DCFG need not contain nodes or edges that are not running. A DCFG is allowed to contain nodes and edges that are not running; they have counts of zero.
  • A DCFG contains edges representing all code paths that are actually running, even if they are due to noncontrol-flow instructions. For example, a floating-point instruction that causes an unmasked exception during running may create an edge to the exception-handling code.

The blocks in a DCFG can be combined into higher-level constructs such as loops, routines, and binary images. Dynamic data, such as loop iteration counts, can be deduced from the underlying edge counts.

Even though a DCFG contains dynamic information such as edge counts, in general, it does not allow you to recreate the order in which edges are run. This additional information is needed for many types of dynamic analysis. So, in addition to the DCFG, we define a DCFG-Trace as the sequence of edges taken while running the workload. The exact sequence of basic blocks, routines, and loops taken by a workload can then be recreated from the combination of its DCFG and DCFG-Trace.

Create and Use DCFG Data

A DCFG and a DCFG trace can be created using Intel® Software Development Emulator (Intel® SDE). A DCFG and a DCFG trace can be used in a variety of ways. For example:
 

  • A program can be written in any language that is capable of parsing JSON files to input and process the raw DCFG data.
  • A stand-alone C++ program can be written using the C++ DCFG APIs to input and conveniently access the DCFG data.
  • A PinPlay tool can be written using the C++ DCFG APIs to access the recorded data and instrument code during a replay based on the data.

Related Content

  • DCFG format: A description of the DCFG and DCFG trace JSON format.
  • DCFG API: API classes and their methods to activate DCFG creation and access the resulting data through C++ code.
  • PinPlay
  • DrDebug

Related Publications

  1. "Graph-Matching-Based Simulation-Region Selection for Multiple Binaries," Yount, C., Patil, H. Islam, M.S., Srikanth, A.; "Performance Analysis of Systems and Software (ISPASS)," 2015 IEEE International Symposium, March 2015: Slides | Paper
  2. PinPlay Tutorial at Programming Language Design and Implementation (PLDI) 2016
  3. Information about DCFG on GitHub*