Data Science: trend, applicability and new data

data science Davidowitz data scientist

Recently, I have been involved in data science projects as never before. Data science is a trend. Saying that someone is data scientist is nice but few know what (s)he actually does. My experience tells that applying techniques quite easy, even in the data science field. The challenge is to find a problem that a technique solves.

Back to the beginning, why am I asked to solve issues through data science? Besides my background, which might suggest that my principal tool is always data science methods, the problem should come first instead of a data science solution. After a deeper problem investigation, I could say if the problem is related to data science. Such confusion might be because of the inexact definition of terms such as data science or data scientist. For instance, a usual definition of data scientist is the professional who has expertise in computer science, statistics, and artificial intelligence. This lack of definition can be the reason for such confusion. Another issue is the marketing involved, which may explain why someone wants data science solutions. Data science is in the peak of expectations and announcing that a company provides data science solutions is a marketing strategy.

The fact is that part of the problems can be solved by a simpler solution, without data science. Other problems can not, especially those in which there is a need to explore data to provide answers to issues. For instance, you need to increase your sales on your website and this is your problem. Then, questions arise such as “what are the characteristics of those who buy on our website?” or “what is the difference between the buyers and non-buyers who surf on the website?”. Now, you have a concrete problem and if simple data like log access is enough to solve it you still not needing data science. But if you realize that you demand a deeper examination, data science methods are welcome.

According to Davidowitz, the author of Everybody Lies, data science is about identifying patterns and predicting how one variable affects another. What information (variables) distinguish buyers from non-buyers (another variable)? This process of identifying patterns is done by our intuition although it fails constantly. It is natural to imagine good buyers’ behaviors and think about many of them. But in data science, you need data to corroborate (or refute) your intuition.

Discovering new data is a fundamental principle of data science since the data that you have might be not enough. Perhaps, one way to detect good customers is through their language traits, analyzing the comments section of the online store. Maybe someone who writes more succinct comments tends to buy more frequently or somebody who asks for detailed information about a product is a potential customer. Note that new data based on linguist traits is considered (like succinct or detailed comments) to address the initial question. Besides linguist traits, other investigations can unveil interesting factors like correlations between the users’ focus on specifics website areas with specific product interest. Discovering new data is an essential task for a data scientist.

In general, data science deals with complex issues through an exploratory process. Particularly, I feel happy when I realize that something can be solved using data science. It often means that I need to get out of my comfort zone and start a deep dive into data.