Worst Practices in Data Mining

July 27, 2010 by Sandro Saitta · Leave a Comment
Filed under: Uncategorized 

dangerI recently read the article Worst practices in business forecasting written by Michael Gilliland and Udo Sglavo. It is published in the July/August issue of AnalyticsMagazine, which is by the way an excellent journal about analytics. In their article, the authors are looking for the reasons why forecasts are sometimes completely wrong. According to them, there are four main reasons:

  • Unsound software
  • Untrained, unskilled, inexperienced or unmotivated forecasters
  • Political contamination
  • Unforecastable behavior

I particularly like a few sentences from the article, which really point out important issues in data mining:

No software, no matter how powerful, and no analyst, no matter how talented, can guarantee perfect (or even highly accurate) forecasts.

Forecast accuracy is ultimately limited by the nature of the behavior being forecast.

Another interesting point is the inappropriate performance objectives mentioned by the authors. It is inappropriate to set an overall objective (in classification accuracy), that would fit any data mining problem. This is strongly related to the post What is a good classification accuracy in data mining?, published a few weeks ago on Data Mining Research.

To read the full article: Worst practices in business forecasting (it may take some time to load the page, be patient)

  • Share/Bookmark

World Programming System: An Alternative to SAS

July 19, 2010 by Sandro Saitta · 4 Comments
Filed under: Uncategorized 
In an earlier post, I was mentioning two ways to reduce the SAS licence costs. The first one, Carolina, consists of translating the SAS code into Java code. However, it seems not very easy to do and the solution is not known (and thus there is no real support for it). Another solution is to interpret your SAS code using the World Programming System (WPS). WPS is a SAS code… Continue reading... | 4 Comments
  • Share/Bookmark

New Data Mining Blog: Data Mining World

July 12, 2010 by Sandro Saitta · Leave a Comment
Filed under: Uncategorized 
I would like to welcome a new blog related to data mining: Data Mining World. Written by Burcu Kalender, a data analysis professional. She writes about various data mining topics such as software, competition, learning resources and many others. Here is an excerpt from a recent post: How do you decide which statistical software to use? Sure you think about which one you handle best or which is most suitable for the… Continue reading...
  • Share/Bookmark

Guest post: Why Google TV Could Destroy Nielsen’s Data

July 8, 2010 by Sandro Saitta · Leave a Comment
Filed under: Uncategorized 
It's my pleasure to welcome Daniel Cawrey for this guest post on Data Mining Research. He has written an interesting post about Google TV and the data mining possibilities. I hope you will enjoy it. There has been a lot of hype surrounding Google TV since it was announced at a developer symposium in May. At the keynote speech of the Google I/O conference, there were… Continue reading...
  • Share/Bookmark

The amount of digital data created in 2010 will equal…

July 5, 2010 by Sandro Saitta · Leave a Comment
Filed under: Uncategorized 
Dead_TwitterIf you want to impress your colleagues/friends with some huge numbers, simply use the funny comparisons made by Information Management in their article "Are You Prepared to Store All This Data?". Here is an excerpt: "The amount of digital information created in 2010 (1.2 zettabytes) will equal:
  • The digital information created by every man, woman and child on Earth “Tweeting” continuously for
  • Share/Bookmark

Online and offline become 1: a new era has begun (part 2)

June 30, 2010 by Sandro Saitta · 1 Comment
Filed under: Uncategorized 
This is the second part of the post Online and offline become 1: a new era has begun. In this post, I discuss the second article by David M. Raab. Bridging the Gap Between Online and Database Marketing Raab starts his article with the following subtitle: "Centralizing information is valuable even when it cannot be tied to a specific individual". This is true since data mining models can be built based… Continue reading... | 1 Comment
  • Share/Bookmark

Online and offline become 1: a new era has begun (part 1)

June 23, 2010 by Sandro Saitta · Leave a Comment
Filed under: Uncategorized 
I recently came across two interesting articles that are closely related to our Customer Online Targeting (COT) tool. Both are from Information Management. The first one, "Online Analytics in Action" by Roman Lenzen, deals with web data and how to manage this huge amount of information. The second one, "Bridging the Gap Between Online and Database Marketing" by David M. Raab focus on linking online with offline data at… Continue reading...
  • Share/Bookmark

Next Page »

  • Data Mining Search Engine

  • Reading Recommandations

  • T-shirts, Mugs & Mousepads

  • Archives

  • Pages

  • Disclaimer

    The opinions discussed on Data Mining Research are my own and do not reflect the position of my current employer, FinScore. The views and opinions expressed by visitors to this blog are theirs and do not necessarily reflect mine.
  • Meta