Data mining and statistics

I have recently found an interesting paper about the connection between data mining and statistics. It is written by Diego Kuonen, who is now working at Statoo Consulting in Switzerland. The basic question that leads his paper is whether data mining is statistical déjà vu.

After explaining what is statistics and why it is needed, he explains data mining using several definitions. He points out an interesting fact by writing that defining and understanding the business process are most important parts of data mining tasks. He argues that:

Even the most advances algorithms cannot figure out what is most important.

He also refers to the garbage in, garbage out issue that has been previously discussed on Data Mining Research. He then concludes that data mining cannot be ignored by companies since the advantages of knowledge extraction for businesses are enormous. I would like to quote a sentence I liked where he emphasizes differences between data miners, statisticians and clients:

[…] computer scientists focus upon database manipulations and processing algorithms; statisticians focus upon identifying and handling uncertainties; and clients focus upon integrating knowledge into the knowledge domain.

If you’re interested, feel free to read the article.


Recommended Reading