Petabyte Age, Data Mining and Science
Natalie Glatzel has written a interesting post on the blog Tasty Data Goodies about an article of Chris Anderson, editor in chief at Wired. Chris’ opinion is that scientific theory is now, in the age of Petabyte, becoming obsolete. He writes that “[…] science can advance even without coherent models […]“. Basically, according to Chris, mining huge amount of data to get knowledge kills scientific theory.
As written by Natalie Glatzel, data mining is not meant to replace science and discovery in general. She writes that
“Data mining can really only point us in the right direction of new discovery by showing us relationships between data points; it can’t generate new discoveries alone.“
My opinion is that the issue pointed by Chris Anderson is not due to the “petabyte age” but rather to the concepts behind data mining itself. Statisticians build a model and then test it. Data miners test the data and then tries to understand them. This is the basic difference between statistics and data mining. And this is distinct from the petabyte issue. Of course data mining is one possible answer to the petabyte age. But in the late 80’s, data mining was already used on “small” data sets (comparing to nowadays). Finally, we should remind that there is a big difference between getting knowledge and using it! As written by Natalie Glatzel:
“Although data mining may change the rules of the science game, it’s definitely not the end of theory.“
For more information, here is the link to Natalie Glatzel’s post.