Filed under: dimensionality reduction, feature extraction, feature selection, variable, visualization
One of the most interesting and well written paper I have read regarding data mining is certainly “An Introduction to Variable and Feature Selection” (Guyon and Elisseeff, 2003). It is freely available on the Journal of Machine Learning Research website. After reading this paper, you should have a good view of what feature selection really is about. Although not popularized, the paper is written in a very readable way.
Feature selection may be useful for facilitating data visualization, reducing storage requirements and increasing performances of learning algorithms. The paper starts by a checklist of crucial points to discuss before applying any learning algorithm on your data. Then, topics such as variable ranking and variable subset selection are covered. A clear distinction is made between three different techniques for variable selection: wrappers, filters and embedded methods.
The article continues on dimensionality reduction and validation techniques in the case of variable selection. Finally examples of open problems are outlined. I have read several papers in data mining and related topics, and this is certainly the most comprehensive and readable one. In addition to the paper, and for more details about Matlab implementation, you can have a look at this post on Will’s blog.