Outlier detection in two review articles (Part 1)

May 12, 2012 by Sandro Saitta
Filed under: Uncategorized 

If you're new here, you may want to subscribe to my RSS feed. Thanks for visiting!

If you need to read two review articles about outlier detection, the first one is…

Outlier Detection: A Survey

The first one, Outlier Detection: A Survey, is written by Chandola, Banerjee and Kumar. They define outlier detection as the problem of “[...] finding patterns in data that do not conform to expected normal behavior“. After an introduction to what outliers are, authors present current challenges in this field. In my experience, non-availability of labeled data is a major one.

The authors proposes three types of supervisions. In supervised outlier detection we make the assumption that labeled data are available. Semi-supervised outlier detection assumes that only one class of labeled data is available. Techniques which models normal instances as the only class are more popular (since normal instances are easier to obtain). The third approach, unsupervised outlier detection, is the most widely used one. The paper continues by describing three types of outliers. Authors then describes several applications of outliers detection in areas such as intrusion detection, fraud detection, industrial damage detection, image processing, etc.

Techniques used for outlier detection are then described. It is surprising to read that most data mining techniques can be applied to the task of outlier detection. For example: neural networks, SVM, rule-based, clustering, nearest neighbors, regression, etc. The articles continues with several other techniques. Authors also describe ways to evaluate results of outlier detection with false positive, false negative and ROC curve. To be noted the 19 pages (!) of references to other articles in the field. One of their main conclusions is that “[...] outlier detection is not a well-formulated problem“. It is your job, as a data miner, to formulate it correctly.

Link to Outlier Detection: A Survey

No TweetBacks yet. (Be the first to Tweet this post)
  • Share/Bookmark

Comments

5 Comments on Outlier detection in two review articles (Part 1)

  1. cristian mesiano on Wed, 23rd May 2012 9:29 pm
  2. Hi Sandro,
    very interesting topic!
    …I tried to do my contribute on my blog at:
    http://textanddatamining.blogspot.com/2012/05/outlier-analysis-chebyschev-criteria-vs.html
    cheers
    c.

  3. Rick Wicklin on Fri, 25th May 2012 3:59 pm
  4. Another useful survey article is “Robust statistics for outlier detection,” by Peter Rousseeuw and Mia Hubert. I wrote a summary of that article, and how to compute the various statistical quantities in SAS, in three blog posts. The first one is http://blogs.sas.com/content/iml/2012/01/20/detecting-outliers-in-sas-part-1-estimating-location/ (scroll to the bottom of the article and click on the “trackbacks” to see Part 2 and Part 3)

  5. Sandro Saitta on Fri, 1st Jun 2012 6:19 pm
  6. Yes, I saw it. It’s on the top of my to-read list! Thanks for the link

  7. Sandro Saitta on Fri, 1st Jun 2012 6:20 pm
  8. Thanks for mentioning this information Rick.

  9. Net on Thu, 21st Jun 2012 9:50 am
  10. thank you!

Tell me what you're thinking...





  • Swiss Association for Analytics

  • T-shirts, Mugs & Mousepads


    All benefits given to a charity association
  • Data Mining Search Engine

    Supported by AnalyticBridge

  • Archives

  • Reading Recommandations