The Chicken and Egg Problem of Data Science

by | Sep 24, 2018 | Problem Solving

What comes first, the problem or the data?

When it comes to data science, the discipline may have a chicken-and-egg problem. Just take a look at this Reddit post:

Now take a look at one of the replies:

The problem that poster obiwancannotbe23 has is that they are looking at a data set to try to pinpoint problems rather than first identifying a problem and determining what data he needs to solve it.

It’s the chicken-and-egg problem in its most native form, only in this case, there’s no question what should come first.

A (Very) Brief History of Data-Driven Problem Solving

Generally speaking, data scientists use data to solve business problems. That data comes from both stored data and through discussions with other colleagues to better understand the data needed.

What I find interesting — if not downright humorous — about data science is that many folks seem to think that using data to solve business problems is a new development.

I beg to differ.

The history of management consulting arguably goes back to 1886, when Arthur D. Little founded the firm that bears his name. Little’s employees performed analytic tasks that ranged from surveying Canada’s natural resources to organizing General Motors’ first R&D lab. So it should come as no surprise that Little himself was a bona fide scientist — an MIT chemist, in fact. With that sort of background, he couldn’t help but be analytical in his approach to business.

What differs from the management consulting Little offered then and the data science being performed today is merely volume. Today, thanks to technology, we can process oceans more data than Little could have ever dreamed of. (Yes, processing data quite often figures in scientists’ dreams.)

So, although using data to solve business problems is not new, using the types and amounts of data that we do today is.

And before you even start to crunch a single number, you first need to know the approach for solving a problem — because all the data in the world isn’t going to help if you’re not solving the right problems.

A Fresh Set of Eyes on the Data

Little may have been the world’s first management consultant, but the big names these days are firms like McKinsey & Company and Bain and Boston Consulting Group (collectively known as the MBB firms).

The McKinsey Mind by Ethan RasielIn the book The McKinsey Mind: Understanding and Implementing the Problem-Solving Tools and Management Techniques of the World’s Top Strategic Consulting Firm, Ethan M. Rasiel and Paul N. Friga, two former consultants of the firm, discuss the problem-solving process practiced by McKinsey & Company, arguably the most famous and sought-after of the MBB firms.

“At the most abstract level,” Rasiel and Friga write, “McKinsey develops solutions to clients’ strategic problems and, possibly, assists in the implementation of those solutions.” At the core, the process involves three parts that move in sequence: problem/business need → initiation/data → solution.

From this breakdown, it becomes clear that the data comes after the problem. Or that the problem precedes the data.

The Three Key Points to Solving Your Problem

As a data scientist (or maybe a type A data scientist), I argue that a data scientist’s primary goal is to support the problem-solving processes already occurring within the larger business.

It’s fun to write PEP8-compliant Python code, SQL queries that use a triple join, or talk about confidence intervals and statistics — at least it is for us data science-minded folks. But without a structured approach to solving problems with data, it’s easy to spin your wheels by doing low-value work. The most pernicious thing about this kind of work is that it feels like you’re getting something done.

But you’re not.

When approaching a business problem, Rasiel and Friga write McKinsey plots its solution by following the guidelines below. The three points come from the book, but I am supplementing each with my own input.

The firm’s approach starts with a hypothesis. They then conduct an analysis (using data) to either prove or disprove the hypothesis. That’s the scientific method in a nutshell. Of course, I have used this approach throughout my career because, well, I’m a scientist. Everything that is done to solve this problem is guided by the hypothesis. This approach avoids rough analysis and low-value busy work.

Here’s how to approach problem-solving in the most straightforward and efficient way.

1. Find the key drivers.

Businesses, just like living organisms, can respond to various stimuli. When approaching a business problem, you should know that it’s not the best use of your time to study each part of a business and pull each lever that you can get your hands on to see what happens. Time is not on your side. That’s why it’s best to apply the Pareto principle — aka the 80/20-rule (which is further elaborated on by Richard Koch in his book The 80/20 Principle: The Secret to Achieving More With Less) — to identify the few key drivers behind 80 percent of the problems that the business is experiencing.

For example, if you’re trying to ensure that sales are stable over time, you should focus on identifying the 20 percent of the client base that is leading to 80 percent of all sales in the organization.

2. Look at the big picture.

We love to get bogged down in the details, but a lot of the time, those details may steer us in a different direction, thus detracting from our end goal. Don’t lose sight of the main reason you started this task in the first place. When I feel like I’m losing my focus, I step back and ask myself questions like “Why is this particular task important?“ and “How can I use this to solve my main problem?” If I can’t relate my answers to the big picture, then I know I’m deviating from the end goal. So if a client presents you with mountains of data, only focus on those datasets that apply to your problem.  As a data scientist, you need to be disciplined to only write code or only conduct an analysis if — and only if — you need it to prove or disprove your hypothesis.

3. Sometimes you have to let the solution come to you.

As much as I like to think that problem solving is an analytical process, I think it’s more of a creative process. If you’re stuck, then it’s possible that you may need to step away from the problem and come back at a later time and let a solution come to you, as opposed to going out searching for one.

When it comes to solving your business’s problems, the answer is always to start by identifying the problem. Swimming through oceans of data first will just make you lost at sea.

Like What You Just Read?

If this issue of Missing Data has brought you any value, it would mean the world to me if you shared it with one friend. Thanks!

Grant