Is There Room for Excel in Data Science?

by | Sep 2, 2018 | Data Science

This question has been plaguing me ever since a recent series of events transpired.

The first was when, while attending a data science webinar, one of the presenters commented, “If you can solve it in Excel, then it is not a data science problem.”

Not long after, I saw this post on Stack Exchange:

There’s much to debate about the differences between data science/analytics/engineering/[fill in the blank]. Just like with the differences between operations research and data science, much of the debate seems to focus on tools rather than techniques.

But is all of this debate even relevant?

Those arguing this point are missing the larger picture of what data science can actually do. Instead, we should be discussing what it means to be a data professional. That way, we can get away from the discussion about tools and focus on the techniques that can be used to solve our many data problems.

In the same way that you wouldn’t tell a carpenter what tools to use to build your house, we shouldn’t limit the tools that a data scientist can use to solve a specific problem. As an experienced data professional, it’s up to you to decide the best tool for the job. And if you want to play MacGyver and use some unconventional method to sort and analyze your data, who am I — or anyone else — to argue?

Just because Excel is ubiquitous and easy to use doesn’t mean it’s not useful. If Excel can get the job done, then so be it. Heck, you can use an abacus, as long as it does the trick.

Take a look at this comment:

I observed the same opinion a few months back. While talking to a manager of a group that does energy analytics for a utility, I asked about the tools they used on a day-to-day basis. She replied that her team used Python, SAS, R, SQL, and Excel. Then she added, “Sometimes it’s just easier to do it in Excel.”

Internet personality and entrepreneur Gary Vaynerchuk talks about clouds and dirt. By that, he means that you need to spend your time doing one of two things. First, thinking at the highest level — the clouds — about why you do your craft. The second part focuses on the dirt: actually doing your craft. “Everything that is between those two things,” Vaynerchuk adds, “is not important.”

And in the spirit of the clouds and dirt, the discussion about Excel is in the middle.

The Week’s Top Five:

1. Machine Learning MNIST Using a Neural Network in Excel 

For many, the MNIST dataset is the Hello World project of neural networks. However, if you think that you can only solve this problem using the Keras | Tensorflow and not something as simple as Excel, think again.

2. Using Excel With Pandas

When I hear many of the conversations about Excel and Python, many folks think that the work needs to be done in Pandas. If you’ve ever watched a carpenter build a house, they use both a hammer and a pneumatic nail gun to drive nails into the wood — these tools coexist and complete each other. The same is true for Excel and Python. It’s up to you to know the strong and weak points of each and how to use both in order to get the job done.

3. XLWings 

There have been several times that I wished that I could have written Python code to complete a task inside of Excel as opposed to VBA. If you’re in the same boat, then XLWings is a cool library to check out since it allows you to do just that.

4. Common Excel Tasks Demonstrated in Pandas 

When transitioning to Pandas from Excel, the most common complaint that I hear is that it takes so long to do the simplest thing, like a pivot chart. While that might be the case, once you learn to write the code needed to complete the common functions, things get easy again. Check out this article that, as the title implies, will show you how to complete common Excel tasks using Python/Pandas.

You can also check out this article about doing a VLOOKUP in Pandas.

5. What Are the Benefits of Python’s Pandas Over Microsoft Excel for Data Analysis?
Aside from the fact that you can analyze more rows in Pandas than Excel, there are other benefits to using Pandas over Excel. Here are a few of them:

James Hritz:

Panda has a lot of power, but at a high level, the module is really good at two things:
1) Munging Data Sets: helping you clean up and put data together into a format that is easy to use, excel friendly, and analyze.

2) Automating the clean up of data sets (missing data, incongruent dates in series, etc).

Daniel Vianna:

Doing things that Excel just can’t do? XD

I do most things in iPython (now Jupyter) notebooks. Read data into Python from the database or Excel/CSV, munge, groupby/count/sum, fill up missing dates and add zero counts, manipulate strings with regular expressions, and even do some natural language processing. Missed a step? Just change a line of code and re run. It takes seconds.

How do you do that in Excel? Start from the beginning?

Michael Herman:

Excel is great for viewing data, performing basic analysis, and drawing simple graphs, but it really isn’t suitable for cleaning up data (unless you are willing to dive into VBA).

Like What You See?

If this issue of Missing Data has brought you any value, it would mean the world to me if you shared it with one friend using the buttons below.


Signature of Grant Aguinaldo