Worst Practices in Data Mining
I recently read the article Worst practices in business forecasting written by Michael Gilliland and Udo Sglavo. It is published in the July/August issue of AnalyticsMagazine, which is by the way an excellent journal about analytics. In their article, the authors are looking for the reasons why forecasts are sometimes completely wrong. According to them, there are four main reasons:
- Unsound software
- Untrained, unskilled, inexperienced or unmotivated forecasters
- Political contamination
- Unforecastable behavior
I particularly like a few sentences from the article, which really point out important issues in data mining:
“No software, no matter how powerful, and no analyst, no matter how talented, can guarantee perfect (or even highly accurate) forecasts.”
“Forecast accuracy is ultimately limited by the nature of the behavior being forecast.”
Another interesting point is the inappropriate performance objectives mentioned by the authors. It is inappropriate to set an overall objective (in classification accuracy), that would fit any data mining problem. This is strongly related to the post What is a good classification accuracy in data mining?, published a few weeks ago on Data Mining Research.
To read the full article: Worst practices in business forecasting (it may take some time to load the page, be patient)
World Programming System: An Alternative to SAS
In an earlier post, I was mentioning two ways to reduce the SAS licence costs. The first one, Carolina, consists of translating the SAS code into Java code. However, it seems not very easy to do and the solution is not known (and thus there is no real support for it). Another solution is to interpret your SAS code using the World Programming System (WPS). WPS is a SAS code… Continue reading... | 4 CommentsNew Data Mining Blog: Data Mining World
I would like to welcome a new blog related to data mining: Data Mining World. Written by Burcu Kalender, a data analysis professional. She writes about various data mining topics such as software, competition, learning resources and many others. Here is an excerpt from a recent post: How do you decide which statistical software to use? Sure you think about which one you handle best or which is most suitable for the… Continue reading...Guest post: Why Google TV Could Destroy Nielsen’s Data
It's my pleasure to welcome Daniel Cawrey for this guest post on Data Mining Research. He has written an interesting post about Google TV and the data mining possibilities. I hope you will enjoy it. There has been a lot of hype surrounding Google TV since it was announced at a developer symposium in May. At the keynote speech of the Google I/O conference, there were… Continue reading...The amount of digital data created in 2010 will equal…
If you want to impress your colleagues/friends with some huge numbers, simply use the funny comparisons made by Information Management in their article "Are You Prepared to Store All This Data?". Here is an excerpt:
"The amount of digital information created in 2010 (1.2 zettabytes) will equal:
- The digital information created by every man, woman and child on Earth “Tweeting” continuously for














