Here we go with the second review article about outlier detection (this post is the continuation of Part I).
A Survey of Outlier Detection Methodologies
This paper, from Hodge and Austin, is also an excellent review of the field. Authors give a list of keywords in the field: outlier detection, novelty detection, anomaly detection, noise detection, deviation detection and exception mining. For the authors, “An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs (Grubbs, 1969)”. Before listing several application in the field, authors mention that an outlier can be “surprising veridical data“. It may only be situated in the wrong class.
An interesting list of possible reasons for outliers is given: human error, instrument error, natural deviations in population, fraudulent behavior, changes in behavior of system and faults in system. Like in the first article, Hodge and Austin define three types of approaches to outlier detection (unsupervised, supervised and semi-supervised). In the last one, they mention that some algorithms can allow a confidence in the fact that the observation is an outlier. Main drawback of the supervised approach is its inability to discover new types of outliers.
The paper continues with a description of several techniques for outlier detection. Consensus or majority voting between several classifiers is also mentioned. I would even go a step further in recommending to combine, if possible, a supervised approach with an unsupervised one. This allow to balance advantages and drawbacks of both approaches. One of the main conclusion from the authors is that “there is no single universally applicable or generic outlier detection approach“. Business understanding and problem definition are two major steps that will help you find the right technique to solve your challenge.