What is the Science part in Data Science anyway?
Data Science, while an increasingly common and popular term, certainly lacks a shared and clear definition. It’s a controversial and disputed label that may or may not include a variety of tools and techniques, in broad or narrow fields of application. What doesn’t help is the fact that there are long and heated discussions about just the first part of the term – Data – to begin with. Is it big data? Is it all data? Is it Data Science only if it’s unstructured data? How much data is needed for Data Science? And so on…
But in this article, I want to focus on the second part of the term – Science. It attracts less controversy, but this is likely because we like to assume we just understand what science means in here. Do we, though? For me, science in Data Science describes the approach taken to how the work with data is carried out. And this approach is based on three pillars.
A well-founded process of discovery
Being scientific in your approach to data means that you have a well-founded and formulated process of discovery that, if repeated by someone else on the same data, would yield largely the same results. You understand where the data comes from, what you do to it, and how you arrive at the results, and you can explain and justify all these steps. If you’re missing these key elements, you’re not doing Data Science, you’re just… playing around with some data.
The scientific method
The second pillar is all about following the scientific method in the work. As a reminder, the scientific method starts with making an observation (or many observations if it’s Data Science!), and then asking a question or identifying a problem in relation to this observation. This is followed by researching for existing answers or solutions, which helps in formulating hypotheses. Then you experiment to test your hypotheses and analyze the resulting data, so that you can accept or reject your hypotheses, and then you draw conclusions and report on your findings. If any of these steps are missing, again you’re not doing science, but rather data guesswork.
A scientist’s mindset
Finally, the science part in Data Science is about the right mindset. You’re a scientist if you adopt the scientific method, and work using a well-founded process of discovery in pursuit of answers to questions. But doing science also means being naturally curious, fleshing out and examining assumptions, never jumping to conclusions too quickly, and thinking critically at every stage of the process. The science part in Data Science is also about creativity, ingenuity and inventiveness. If you don’t adopt the scientist’s mindset, you may be able to analyze the data, but not create solutions to business problems that can bring value.
Organizations that want to benefit from Data Science all too often remain focused solely on the Data part, making sure they collect and render usable as much of it as possible. Regrettably, they forget about the Science part, while this is where the key to value lies.