A Workable Definition of Data Science

John Foreman in, Data Smart: Using Data Science to Transform Information into Insight works towards a A Workable Definition of Data Science, which is definitely an excellent read.

Screen Shot 2016-06-09 at 1.07.21 PM

To an extent, data science is synonymous with or related to terms like business analytics, operations research, business intelligence, competitive intelligence, data analysis and modeling, and knowledge extraction (also called knowledge discovery in databases or KDD). It’s just a new spin on something that people have been doing for a long time.

There’s been a shift in technology since the heyday of those other terms. Advancements in hardware and software have made it easy and inexpensive to collect, store, and analyze large amounts of data whether that be sales and marketing data, HTTP requests from your website, customer support data, and so on. Small businesses and nonprofits can now engage in the kind of analytics that were previously the purview of large enterprises. Of course, while data science is used as a catch-all buzzword for analytics today, data science is most often associated with data mining techniques such as artificial intelligence, clustering, and outlier detection. Thanks to the cheap technology-enabled proliferation of transactional business data, these computational techniques have gained a foothold in business in recent years where previously they were too cumbersome to use in production settings.
In this book, I’m going to take a broad view of data science. Here’s the definition I’ll work from:

Data science is the transformation of data using mathematics and statistics into valuable insights, decisions, and products.

This is a business-centric definition. It’s about a usable and valuable end product derived from data. Why? Because I’m not in this for research purposes or because I think data has aesthetic merit. I do data science to help my organization function better and create value; if you’re reading this, I suspect you’re after something similar.

With that definition in mind, this book will cover mainstay analytics techniques such as optimization, forecasting, and simulation, as well as more “hot” topics such as artificial intelligence, network graphs, clustering, and outlier detection.

Some of these techniques are as old as World War II. Others were introduced in the last 5 years. And you’ll see that age has no bearing on difficulty or usefulness. All these techniques—whether or not they’re currently the rage—are equally useful in the right business context.

And that’s why you need to understand how they work, how to choose the right technique for the right problem, and how to prototype with them. There are a lot of folks out  there who understand one or two of these techniques, but the rest aren’t on their radar. If all I had in my toolbox was a hammer, I’d probably try to solve every problem by smacking it real hard. Not unlike my two-year-old.

Better to have a few other tools at your disposal.