Feature selection is a technique used to reduce the number of features before applying a data mining algorithm. Irrelevant features may have negative effects on a prediction task. Moreover, the computational complexity
of a classification algorithm may suffer from the curse of dimensionality caused by several features. When a data set has too many irrelevant variables and only a few examples, overfitting is likely to occur. In addition, data are usually better characterized using fewer variables. Feature selection has been applied in fields such as multimedia database search, image classification and biometric recognition. A comprehensive introduction to feature selection can be found in the paper by Guyon et al.
Feature selection techniques can be divided in three main categories: embedded approaches (feature selection is part of the classification algorithm, i.e. decision tree), filter approaches (features are selected before the classification algorithm is used) and wrapper approaches (the classification algorithm is used as a black box to find the best subset of attributes). Due to its very definition, embedded approaches are limited since they only suit a particular classification algorithm. A relevant feature is not necessarily relevant for a given classification algorithm. Filter methods, however, do the assumptions that the feature selection process is independent from the classification step. The work done by Kohavi et al. (1995) recommends to replace filter approach by wrappers. The latter provide usually better results, the price being higher computational complexity. Although already known in statistics and pattern recognition, wrappers are new in the data mining community.
(1) Kudo, M. and Sklansky, J., Comparison of algorithms that select features for pattern classiers, Pattern Recognition, 2000, 33, 25-41.
(2) Blum, A.L. and Langley, P., Selection of relevant features and examples in machine learning, Artificial Intelligence, 1997, 97, 245-271.
(3) Kohavi, R. and John, G., Feature Selection for Knowledge Discovery and Data Mining, The Wrapper Approach, Kluwer Academic Publishers, 1998, 33-50.