Outlier detection in two review articles (Part 1)

May 12, 2012 by Sandro Saitta · Leave a Comment
Filed under: Uncategorized 

If you're new here, you may want to subscribe to my RSS feed. Thanks for visiting!

If you need to read two review articles about outlier detection, the first one is…

Outlier Detection: A Survey

The first one, Outlier Detection: A Survey, is written by Chandola, Banerjee and Kumar. They define outlier detection as the problem of “[...] finding patterns in data that do not conform to expected normal behavior“. After an introduction to what outliers are, authors present current challenges in this field. In my experience, non-availability of labeled data is a major one.

The authors proposes three types of supervisions. In supervised outlier detection we make the assumption that labeled data are available. Semi-supervised outlier detection assumes that only one class of labeled data is available. Techniques which models normal instances as the only class are more popular (since normal instances are easier to obtain). The third approach, unsupervised outlier detection, is the most widely used one. The paper continues by describing three types of outliers. Authors then describes several applications of outliers detection in areas such as intrusion detection, fraud detection, industrial damage detection, image processing, etc.

Techniques used for outlier detection are then described. It is surprising to read that most data mining techniques can be applied to the task of outlier detection. For example: neural networks, SVM, rule-based, clustering, nearest neighbors, regression, etc. The articles continues with several other techniques. Authors also describe ways to evaluate results of outlier detection with false positive, false negative and ROC curve. To be noted the 19 pages (!) of references to other articles in the field. One of their main conclusions is that “[...] outlier detection is not a well-formulated problem“. It is your job, as a data miner, to formulate it correctly.

Link to Outlier Detection: A Survey

  • Share/Bookmark

Data Mining Interview: Meta Brown

April 29, 2012 by Sandro Saitta · Leave a Comment
Filed under: Uncategorized 
MetaMeta Brown, General Manager of Analytics at LinguaSys, has kindly accepted to answer a few questions for Data Mining Research. I would like to thank Meta for her time and her advices in Analytics. Data Mining Research: Who are you and what is your story? Meta Brown: I'm a practical, plain-talking data analyst and engineer.  I use data to tell true stories… Continue reading...
  • Share/Bookmark

Data Mining Book Review: The Value of Business Analytics

April 13, 2012 by Sandro Saitta · 1 Comment
Filed under: Uncategorized 
Value of BAToday's book review is about The Value of Business Analytics - Identifying The Path to Profitability, from Evan Stubbs. I won't keep any suspense: the book is excellent! It's a must have for any person trying to apply analytics within a company. The book is published by Wiley/SAS, but don't worry, there is no promotion of SAS tools… Continue reading... | 1 Comment
  • Share/Bookmark

Data Mining Guest Post: Gaurav Vohra

April 1, 2012 by Sandro Saitta · Leave a Comment
Filed under: Uncategorized 
GauravIt's my pleasure to welcome Gaurav Vohra on Data Mining Research for a guest post. He writes about analytics in India. Thanks Gaurav, for your contribution! How India became the global hub for analytics? The economic reforms at the start of the 90s led to liberalization of government policies and encouraged multi-nationals such as GE, American Express and British… Continue reading...
  • Share/Bookmark

Statistical Analysis: Common Mistakes

March 21, 2012 by Sandro Saitta · 1 Comment
Filed under: Uncategorized 
caution2Whether you are a beginner in the field or an expert in statistics, the article by Dubey and Rajaram, 5 Common Mistakes People Make in the Name of Statistical Analysis, is a must read. The paper starts with this excellent example: "Imagine you are a regional sales head for a major retailer in U.S. and you want to know what drives… Continue reading... | 1 Comment
  • Share/Bookmark

Data Mining Book Review: Handbook of Statistical Analysis and Data Mining Applications

March 10, 2012 by Sandro Saitta · 2 Comments
Filed under: Uncategorized 
saadmThe book from Robert Nisbet, John Elder and Gary Miner is a fresh addition to any data miner library. First surprise: the book is in full colors with a lot of pictures which is a good point. With a focus on data mining applications, the book also covers introduction and more profound data mining concepts. The book isn't very technical… Continue reading... | 2 Comments
  • Share/Bookmark

Selling Data Mining to Management

February 19, 2012 by Sandro Saitta · 5 Comments
Filed under: Uncategorized 
sellingDMPreparing data and building data mining models are two very well documented steps of analytics projects. However, whatever interesting your results are, they are useless if no action is taken. Thus, the step from analytics to action is a crucial one in any analytics project. Imagine you have the best data and found the best model of all time. You… Continue reading... | 5 Comments
  • Share/Bookmark

Next Page »

  • Links

  • Data Mining Search Engine

    Supported by AnalyticBridge

  • T-shirts, Mugs & Mousepads

  • Archives

  • Reading Recommandations