Recently, I have been involved in data science projects as never before. Data science is a trend. Saying that someone is a data scientist sounds nice, but few know what they actually do. My experience tells me that applying techniques is quite easy, even in the data science field. The challenge is to find a problem that a technique solves.
Back to the beginning, why am I asked to solve issues through data science? Regardless of my background, which might suggest that my principal tool is always data science methods, the problem should come first instead of a data science solution. After a deeper investigation of the problem, I could say if it is related to data science. Such confusion might be because of the imprecise definition of terms such as data science or data scientist. For instance, a common definition of data scientist is the professional who has expertise in computer science, statistics, and artificial intelligence. This lack of definition can be the reason for such confusion. Another issue is the marketing involved, which may explain why someone wants data science solutions. Data science is at the peak of expectations, and announcing that a company provides data science solutions is a marketing strategy.
The fact is that some of the problems can be solved by a simpler solution, without data science. Other problems cannot, especially those in which there is a need to explore data to provide answers. For instance, you need to increase your sales on your website, and this is your problem. Then, questions arise such as “what are the characteristics of those who buy on our website?” or “what is the difference between the buyers and non-buyers who surf on the website?”. Now, you have a concrete problem, and if simple data like log access is enough to solve it, you still don’t need data science. But if you realize that you require a deeper examination, data science methods are welcome.
According to Davidowitz, the author of Everybody Lies, data science is about identifying patterns and predicting how one variable affects another. What information (variables) distinguishes buyers from non-buyers (another variable)? This process of identifying patterns is done by our intuition, although it fails constantly. It is natural to imagine good buyers’ behaviors and think about many of them. But in data science, you need data to corroborate (or refute) your intuition.
Discovering new data is a fundamental principle of data science since the data that you have might not be enough. Perhaps, one way to detect good customers is through their language traits, analyzing the comments section of the online store. Maybe someone who writes more succinct comments tends to buy more frequently, or somebody who asks for detailed information about a product is a potential customer. Note that new data based on linguistic traits is considered (like succinct or detailed comments) to address the initial question. Besides linguistic traits, other investigations can unveil interesting factors like correlations between users’ focus on specific website areas and specific product interest. Discovering new data is an essential task for a data scientist.
In general, data science deals with complex issues through an exploratory process. Particularly, I feel happy when I realize that something can be solved using data science. It often means that I need to get out of my comfort zone and start a deep dive into data.