Intractable Problems

A view from Intel’s Bob Rogers on how to get value from almost any analytics project.

There are ways to get meaningful results even when a problem seems like it can’t be solved. Bob Rogers, Intel’s chief data scientist, explains how.

I spent more than a decade forecasting futures as the manager of a hedge fund. We had tick-by-tick data going back decades, but there was a huge random component to this data that made automated prediction beyond a certain accuracy impossible. All the motives people have for buying and selling at a particular moment, combined with the sheer number of people trading, meant that no matter what we did, we’d never perfectly pluck signals from the noise.

In data science, we call these intractable problems, and, past a certain point, analytics and big data may simply never make progress.

The good news is that many problems that at first seem intractable can be addressed by tweaking your approach or your inputs.

Knowing when problems that seem intractable can be solved with some affordable changes will position a business—and a project sponsor—for ongoing success. Conversely, being able to recognize problems that are defined at an unrealistic scale will prevent squandering time and money that you could profitably apply to a more focused question.

Here are four troubleshooting methods that could improve your results. By iteratively applying one or more of them, you could exchange banging your head against a wall for increasing the chances of finding value in your analytics work.

1. Ask a More Focused Question

Often, the best way forward is to try to solve for a subpart of your original question and extrapolate lessons. Trying to determine the likelihood that any given social-media user will be interested in a car model you’re designing is likely intractable. Even with lots of good data, you might have too many variables to arrive at a model with real predictive value.

Sometimes when you add a new set of data, skies open up, and you find new predictive power.

But you might be able to predict an increase or decrease in sales to a specific demographic. From there, you could determine whether a change such as a boxier design would boost sales to soccer moms more than they would hurt sales to single twentysomethings. That’s a more manageable problem scope that still delivers real value to your business.

The same approach can help you isolate variables that are throwing off your algorithm. Instead of trying to predict hospital readmission rates for all patients, for example, you might divide a patient set into two groups—perhaps one of patients with multiple significant conditions and the other of patients with only a single condition, such as heart failure.

If the quality of prediction in one group shows a meaningful improvement or decline, that would indicate that your algorithm works for a data set that is not just smaller, but specifically clear of a particular confounding variable present in the larger pool.

2. Improve Your Algorithm

In data science, algorithms not only define the sequence of operations that your analytics system will perform against the data set, they also reflect how you think about, or “model”, potential relationships within the data.

Sometimes creating the right algorithm, or modifying an available algorithm for your specific new purpose, requires many iterations. (Machine learning offers promise for automating the improvement of algorithms; that’s a discipline to watch.)

Sometimes when you add a new set of data, skies open up, and you find new predictive power.

One sign that your algorithm isn’t working is if you’ve scaled your compute power by, say, a factor of five, but are seeing a much smaller improvement in processing time.

Another test is to slightly tweak your algorithm parameters. Slightly different algorithms should produce only slightly different answers. If they produce drastically different answers, chances are that something is off, and you need a different algorithm.

And perhaps you’ve chosen the wrong type of algorithm altogether. Model selection is often based on assumptions about the data, such as expecting a linear progression between two elements when their relationship could be more accurately represented by a decision tree.

There are many libraries of publicly available, open-source algorithms. You rarely have to start from scratch.

 

 

3. Clean up Your Data

This is an age-old challenge for IT. Garbage in, garbage out. Ideally, this is something you will have tackled prior to starting any analytics project, but problems with data sets often aren’t clear until you begin your analysis.

 

4. Use Different Data

This is a slightly trickier variation of the previous step. To get more data, you might just need to update your metadata. You might need to change some processes to capture the data you need.

Most businesses have already squeezed as much value as possible out of the data they store in traditional data warehouses. Sometimes when you add a new set of data—especially unstructured data, such as text progress notes written by doctors or documented interactions between call center employees and customers—skies open up, and you find new predictive power.

As a general rule, more data should help produce better answers. As you test an analytics project, add data in sequence to see how they change the answers. So long as your answers keep getting better, you most likely haven’t hit the point of intractability.

When your progress slows, take stock of the cost of possible approaches versus the potential payoff. And it doesn’t hurt to keep this in mind: Trying to predicting human behavior too accurately might be the root of all intractability.

Download ‘From Data to Action’