Feature selection

If you're new here, you may want to subscribe to my RSS feed. Thanks for visiting!

One of the most interesting and well written paper I have read regarding data mining is certainly “An Introduction to Variable and Feature Selection” (Guyon and Elisseeff, 2003). It is freely available on the Journal of Machine Learning Research website. After reading this paper, you should have a good view of what feature selection really is about. Although not popularized, the paper is written in a very readable way.

Feature selection may be useful for facilitating data visualization, reducing storage requirements and increasing performances of learning algorithms. The paper starts by a checklist of crucial points to discuss before applying any learning algorithm on your data. Then, topics such as variable ranking and variable subset selection are covered. A clear distinction is made between three different techniques for variable selection: wrappers, filters and embedded methods.

The article continues on dimensionality reduction and validation techniques in the case of variable selection. Finally examples of open problems are outlined. I have read several papers in data mining and related topics, and this is certainly the most comprehensive and readable one. In addition to the paper, and for more details about Matlab implementation, you can have a look at this post on Will’s blog.

No TweetBacks yet. (Be the first to Tweet this post)
  • Share/Bookmark

Comments

4 Comments on Feature selection

  1. Will Dwinnell on Thu, 8th Feb 2007 1:46 pm
  2. I’ve noticed that, among filter methods described in the literature, there seem to be 3 common approaches:

    1. Seek correlation between individual predictors and the target and, simultaneously, lack of correlation among predictors (CFS, if I’m not mistaken).

    2. Seek a group predictors which provide high separation of target classes (Fisher discriminant method, Weiss and Indurkhya’s independent features).

    3. Reduce predictors without regard to target (PCA, clustering of predictor variables).

    Lately, I’ve been leaning heavily on my GA-driven implementation of Weiss and Indurkhya’s approach (which seems to work very well for linear models), but am collecting a number of these techniques.

  3. damien on Sat, 10th Feb 2007 9:56 pm
  4. This paper is indeed really good (it was the subject of one of the earliest posts on my blog)

    The authors have just edited a book that is also very well written. The first part is an introduction to feature selection and the second part presents the results of the feature selection contest that was help in 2003. See the website of the book here : here

  5. Sandro Saitta on Mon, 19th Feb 2007 8:02 pm
  6. Will and Damien, thanks for your complementary comments. I will have a look at this new book you mentioned as well as your blog.

    [...] and the curse of dimensionality. When you use only the higher information features, you can increase performance while also decreasing the size of the model, which results in less memory usage along with faster [...]

Tell me what you're thinking...





  • Swiss Association for Analytics

  • T-shirts, Mugs & Mousepads


    All benefits given to a charity association
  • Data Mining Search Engine

    Supported by AnalyticBridge

  • Archives

  • Reading Recommandations