The Three Pillars of Machine Programming Provide Core Concepts for Research Advances

Justin Gottschlich and his team of machine programming researchers aim to democratize and accelerate the creation of quality software. (Credit: Walden Kirsch/Intel Corporation)

In the future, Intel Labs scientists believe that computers will become programmers, inventing algorithms and data structures through an emerging technology known as machine programming. Based on a research framework known as the three pillars of machine programming — jointly envisioned by Intel Labs and Massachusetts Institute of Technology (MIT) scientists — this novel field is making notable research advances in the automation of software and hardware development.

“For me, the core pillars of MP started to form as we began to see the explosion and realization of heterogeneous hardware and software,” said Justin Gottschlich, principal scientist and the director and founder of Machine Programming Research at Intel Labs.

With the increasing need for diverse hardware due to multitudes of different types of software applications, hardware architectures have been evolving and solving more unique software problems than ever before.

“At Intel alone, we build CPUs, GPUs, ASICs, FPGAs, neuromorphic, and quantum computing machines, to name a few. But how can programmers effectively master programming each of these types of hardware? Perhaps more importantly, how can it be done without a loss of programmer productivity and result in software that has quality characteristics, such as correctness, efficiency, security and maintainability that are on par with the best software,” said Gottschlich.

Gottschlich believes that one of the biggest challenges in this new era of heterogeneous computing is finding novel techniques to automate the development of software, so programmers can keep pace and effectively utilize the hardware emerging in this heterogeneity revolution.

“With Intel’s pioneering research effort on machine programming, we aim to democratize and accelerate the creation of quality software through our research,” said Gottschlich.

However, programming is a cognitively demanding task that requires extensive knowledge, experience, and a large degree of creativity — making it notoriously difficult to automate, according to a position paper jointly published by researchers at Intel Labs and MIT in 2018. Machine programming has the capacity to reshape the way software is developed. At some level, this has already begun, as machine learning (ML) components are progressively replacing complex hand-crafted algorithms in domains such as natural language understanding and vision.

"Yet, we believe that it is possible to move much further. We envision a fusion of both stochastic (machine learning) and deterministic (formal) methods that when coupled with the right core ingredients will deliver a significant degree of automation to reduce the cost of producing secure, correct, and efficient software," according to Gottschlich and his research team.

When fully realized, these systems promise to enable non-programmers to harness the full power of modern computing platforms to solve complex problems correctly and efficiently. MP could democratize the creation of software beyond engineers.

“There are approximately 27 million programmers worldwide out of a global population of 7.8 billion people. This means that less than 1% of the world’s population can code,” said Gottschlich, citing software developer population research by Evans Data Corporation.

“I believe that machine programming has the potential to change almost everything. All the rules that we think we know about software and hardware development are about to change – it’s an amazing time to be working in this vibrant field.”

Three Pillars of Machine Programming

While fully automated machine programming systems may be more than two decades away, Gottschlich and his team are advancing research today under the three pillars of MP:

· Intention: Discover the intent of a programmer or user using a variety of expression techniques.

· Invention: Create new algorithms and data structures, and lift semantics from existing code.

· Adaptation: Evolve software in a dynamically changing hardware/software world.

Intention focuses on simplifying the interface between the human and the MP system, finding new ways for humans to express ideas to the machine. The MP system would meet human programmers on their terms, instead of forcing them to express code in computer/hardware notations.

Invention emphasizes machine systems that create and refine algorithms, or the core hardware and software building blocks from which systems are built. This pillar focuses on implementation at a higher order around the data structures and algorithms required to create a particular program.

“It's important to note that much of the invention isn’t usually about novel invention. It's more about putting together a number of known things in a perhaps novel way to construct the program,” said Gottschlich. “However, there are some cases where MP invention is actually inventing something that hadn’t been previously discovered, which is truly astonishing from my view.”

Adaptation is all about making fine-tuned adjustments of a given program to execute on a specific set of constraints, such as specialized hardware or a particular software platform. Adaptation focuses on automated tools that help software adapt to changing conditions, such as bugs or vulnerabilities found in an application or new hardware system.

“What we generally find is that once the intention is known, and the user doesn't cross the boundaries of invention and adaptation, it frees the machine to explore more possibilities in terms of invention and adaptation,” said Gottschlich. “This can allow a massive range of potential solutions, some of which may have been out of the scope of what programmers have historically considered.”

Machine programming could use many approaches, but Gottschlich believes that it is branching in two directions: stochastic and deterministic methods. Stochastic methods include ML, such as deep neural networks, reinforcement learning, genetic algorithms, Bayesian networks, and others. Deterministic methods include formal methods, such as formal verifiers, spatial and temporal logics, formal program synthesizers, and more.

While deterministic methods are usually formulated using specific parameters, stochastic methods can help address problems associated with some degree of uncertainty. Deterministic solutions are generally precisely reproducible every time, even if input data is changed. Yet, this tends to not be the case with stochastic systems. As such, as MP research moves forward, solutions are emerging using a fusion of both stochastic and deterministic methods, according to Gottschlich.

Automated Regression Testing

Intel Labs is currently working on a variety of MP proof points, including a novel approach to automating software performance regression testing. Software performance regressions are defects that are erroneously introduced into software as it evolves from one version to the next. While they tend to not impact the functional correctness of the software, they can cause significant degradation in execution speed and resource efficiency of many classes of software systems, such as database systems, search engines, compilers, and other large-scale software systems.

According to a research paper published in collaboration with MIT and Texas A&M University, a novel system called AutoPerf automates performance regression testing using three core techniques: zero-positive learning, autoencoders, and hardware telemetry. Researchers demonstrated AutoPerf’s generality and efficacy against three types of performance regressions across 10 real performance bugs in seven benchmark and open-source programs.

On average, AutoPerf exhibited 4% profiling overhead and accurately diagnosed more performance bugs than prior state-of-the-art approaches. By emitting no false negatives, AutoPerf did not miss any performance bugs, which can be a critical property for regression testing systems used in production-quality software. It may be impossible to entirely avoid performance regressions during software development, but with proper testing and diagnostic tools, the likelihood for such defects to silently leak into production code might be minimized.

“This form of MP is a concrete example of neural programming because the neural net is replacing the code or test,” said Gottschlich. “The MP system invents the regression tests and adapts them to the specialized hardware to analyze performance. This works well in the problem domain of performance regression because intention is known. Simply put: Don’t degrade the software’s performance when it’s updated.”

Automated Code Similarity System

Intel Labs is also researching code similarity systems, which are integral to a range of applications from code recommendation to automated software defect correction. A key technology for several emerging MP tools, code similarity has the potential to accurately and efficiently automate some of the software development process.

In collaboration with MIT and the Georgia Institute of Technology, the team is working on machine inferred code similarity (MISIM), an automated engine designed to learn what a piece of software intends to do by studying the structure of the code and analyzing syntactic differences of other code with similar behavior.

A core differentiation between MISIM and existing code-similarity systems lies in its novel context-aware semantic structure (CASS), which aims to lift out what the code actually does. Unlike other existing approaches, CASS can be configured to a specific context, allowing it to capture information that describes the code at a higher level and in a specialized scenario. CASS can provide more specific insight into what the code does, rather than how it does it.

Once the code’s structure is integrated into CASS, neural network systems give similarity scores to pieces of code based on the jobs they are designed to carry out. If two pieces of code look different in their structure but perform the same function, the neural networks would rate them as largely similar. Researchers found that MISIM was able to identify similar pieces of code up to 40X more accurately than prior state-of-the-art systems.

These types of research projects are just the beginning in expanding the growing field of MP. As he looks to the future, Gottschlich believes that “through machine programming, we can start to democratize the creation of software, but this may be largely predicated on large strides in building MP systems that are intention-based, which historically hasn’t been the focus of many software systems. Although I think we’ve got a long way to go, I see a tremendous opportunity for an emergent MP community focusing on many problems.”