In a recent post on Data Mining Research, Will mentioned a paper entitled Statistical Modeling: The Two Cultures. This paper, written by Leo Breiman (the father of decision trees) and published in 2001 in Statistical Science is intended to both statisticians and data miners. As indicated in the title, Breiman compares two different cultures: the statistical culture assuming data models and the data mining culture using algorithmic models.
The whole paper is about comparing these two ways of thinking and solving problems. The author suggests that algorithmic models should be used instead of data models. One of his main argument is that data models are not applicable to a wide range of current problems. The power of this article is to explain complex ideas in a readable manner. Breiman is very good at showing the difference between the two approaches.
According to Breiman, the problem with statisticians can be explained this way:
“This enterprise has at its heart the belief that a statistician, by imagination and by looking at the data, can invent a reasonable good parametric class of models for a complex mechanism devised by nature“
This is of course not possible in the case of very complex problems. This is one of the limitations of the statistician approach. On the contrary, in data mining we consider the “mechanism devised by nature” complex and unknown. The article then deals with topics such as the multiplicity of good models and the curse of dimensionality.
The aim of Breiman is not to say that data miners are more efficient than statisticians, but rather that statisticians should be open to a wider variety of tools. As a conclusion, I think this paper is worth reading, whether you area a statistician or a data miner. I have read several papers during my PhD and this is certainly one of the most interesting one.
Thanks to Will Dwinnell for mentioning this article.