Yes, data mining can have an effect. And I’m not thinking about the effect you feel when you do the most exciting job in the world. In this post, I want to discuss the effect of applying data mining in an iterative way (call it the data mining bias if you prefer). Let me explain this topic with a concrete example.
Think about market basket analysis (association rules mining). Imagine a case study that can happen at Amazon (to cite the most well known). We collect transactions made by customers. We build a model to suggest other books you may purchase. One month later, we run the same process again. However, the data collected was biased by the previous model. After several iterations, we may miss important associations if customers mainly buy what they are recommended.
In another perspective, one may think that data mining (particularly in this case) limits the choices of the user. I have already heard this argument from detractors of data mining. On the other hand, recommendations also increase your chances of finding the right book for you.
What do you think of this issue? First, is it an issue? Is the data mining effect real and problematic? Can we add some randomness in the process to avoid it? This post is aimed at opening the discussion about this issue (or pointing to relevant literature). So, feel free to comment and give your opinion!