What We Need to Move to Data Science 2.0

by | Oct 8, 2018 | Data Science

In 2011, Harvard Business Review described data science as “the sexiest job of the 21st century” — and readers went gangbusters. The tech industry saw a surge of postings and interest for a position that hadn’t officially existed even a few years earlier.

But the practice we today, call “data science” isn’t new. In fact, people have been doing data science for years before it was officially deemed sexy.

All-Access Pass?

While there are many reasons why data science blew up, I’d argue that the primary reason was due to access: access to computing power, access to data, and access to open-source tools that allow for (relatively) simple analysis and modeling of the data. And much of that access is thanks to Wes McKinney’s late-‘00s creation of the open-source library Pandas, which made it possible for anyone (provided they had knowledge of the Python programming language) to conduct data analysis.

Just as access was the reason why data science 1.0 bloomed into the field that it is today, I’d be willing to bet that access will again be how we move to 2.0, which I predict will be where data science becomes less reliant on coding and more dependent on the basics of scientific inquiry (i.e., application of the scientific method). We’re already starting to see parts of this trend emerge in the marketplace:

In a 2017 Fast Company article, Sean Captain he notes that “Artificial intelligence is tiptoeing up the accessibility curve that web programming and other technologies have taken.” I find it interesting that what once required a diploma to master is now readily available to any layperson willing to slog through a couple dozen hours of YouTube videos.

In another example, (emphasis mine) “[Google’s] BigQuery allows interactive analysis of large data sets, making it easy for businesses to share meaningful insights and develop solutions based on customer analytics. While this shows progress, it’s worth noting that many of the businesses using BigQuery aren’t using machine learning to better understand the data they are generating. This is because data analysts, proficient in SQL, may not have the traditional data-science background needed to apply machine-learning techniques.”

Democratizing Data Science

As data science 2.0 takes off, businesses will need to answer the long-standing problem often seen in the manufacturing sector: Do we make or buy? In other words, do we make a data-science model from scratch to fit our use case or buy a non-customized, out-of-the-box solution? While not all problems can be solved using the latter, the vast majority can be.

Therefore, as access increases, it becomes increasingly important to focus on the science of the field and not the technology. That means that data scientists will need to focus on the basics of working with data, such as experimental design, building context, and determining if the model fits the business need.

For many, the tallest hurdle to clear is the hard programming. But once past that hurdle, the need for customization (as it relates to model building and implementation) goes down, meaning accessibility goes way up. And once accessibility is taken out of the equation, companies of just about any size and any field will be able to extract valuable insights from their data sets.

Are You Ready for Data Science 2.0?

From all of this, there are four factors that will be needed in data science 2.0:

  1. Being a careful observer. The scientific process starts with an observation before anything else. That means it’s increasingly important for data scientists to be careful observers. In our lightning-quick world, it’s easy to miss small details in the data that may be important when building context for a specific problem. In data science 2.0, you’re going to need to be a good careful observer to “position yourself to make connections and linkages” to ideas that are both in and outside of the data.
  2. Understanding the scientific method. We will see more of a focus on science and the scientific method. While the idea of the scientific method has been around for centuries, it wasn’t until the 19th century that the use of the term became mainstream within the technical community. And while we can all agree that the scientific method has gifted us with a profound understanding of the natural world, its absence in the business realm is indeed notable. Take, as an example, former J.C. Penney CEO Ron Johnson, whose strategy of trusting his gut more than the data almost resulted in the demise of the long-time brand. In data science 2.0, using context and the scientific method will be what separates the winners from the losers, as there will be no excuse about coding talent and/or access.
  3. Allowing the mind to be creative. Although problem solving can be analytical, it can also require creativity. For data science 2.0 to work, we need to embrace problem solving as a creative process. Because of how our brains are wired, we may benefit from not thinking about problems so hard in order to solve them. That differs greatly from what often occurs today in organizations: Despite the benefits of using creativity to solve business problems, research says there’s a gap between the amount of creativity executives claim to use and the amount employees believe is being used.
  4. Applying the basics. This ability will be needed above all. While data science 1.0 was focused on figuring out how to apply long-standing tools and techniques to solve problems using enormous amounts of data and modern computing infrastructure (e.g., cloud and distributed computing), 2.0 will require business basics such as giving effective data-centric presentations, having a solid background in experimental design and statistics, possessing solid communication skills, and being a team player.

We’re currently in data science 1.5. As a scientist (and admitted data junkie), I can’t wait to see what data science 2.0 brings.