Top 5 Reasons R is Good for you

January 21, 2009 by
Filed under: R in data mining, R language 

After reading the interesting post of Ajay, I decided to write a post about the good aspects of R. First, I would like to state that I’m not a SAS nor a Clementine user. So the following arguments are my opinions as a R programmer:

  • R is easy and free to improve: R contains hundreds of useful packages (data mining, finance, etc.). If this is not enough, you can program your own packages and share them with others. You are not dependent on some programmers.

  • R is a white-box: Since R is a programming language, it is easy to understand the overall process of the system in development. There is no GUI that allows you to put black-box components that may be unclear.
  • When you know R, you know everything: Ok, this is a bit too much. But the message is that it is much more easier to start with R and then move to SAS or Clementine than the opposite. Especially for users who only use the GUI.

  • R is free: This is very good since small companies don’t have the money to buy SAS or Clementine. Also, if several users need such tools, then the price increase. Of course, in a large company, SAS and SPSS tools may be an alternative.
  • R is a good choice: R is as convenient as Matlab (or even more?) and as cheap as Java (which means free). Which makes R an excellent choice among existing tools and programming languages.

Here is an article about R from the New York Times. Since the above list is completely subjective, you are invited to give your own opinion by posting a comment.

Share

Comments

11 Comments on Top 5 Reasons R is Good for you

  1. Steffen on Wed, 21st Jan 2009 6:53 pm
  2. I totally agree…

    I wondered … Sandro, can you recommend a good R Programming Book ? Or (more important) Software Development with R (S4 …) ? One of the drawbacks of a scripting language like R is the invitation to hack code together…

    kind regards,

    Steffen

  3. Matthias on Wed, 21st Jan 2009 11:19 pm
  4. Ok with you.

    R is living! A lot of functions, methods, docs and tutorials totally free!

    Unfortunately, R is incapable to work with matrix larger than the physical memory of the PC. But if you work on “small” datasets (or aggregated data), it’s the one.

    Nevertheless, this is an excellent companion for a data miner (see deeply the data, build amazing grahics or develop personal algorithms).

    Thanks you Sandro.

  5. Erik on Thu, 22nd Jan 2009 10:22 am
  6. I would like to give the top one reason I think why R is not used in operational data mining: One of R main weaknesses is the way data is managed. There is a workspace in memory in which data have to be imported and then from which results are exported. This means that for big dataset memory issues are frequent.

    Remember that the vast majority of operational data mining (I mean by that, the data mining projects which results are used operationally on a day to day basis) are made in CRM. In this field, we have regularly training data sets with hundreds or thousands columns and hundreds of thousands lines, so R is cornered into domains with less data volume constraints.

  7. Sandro Saitta on Thu, 22nd Jan 2009 5:43 pm
  8. @Steffen: I don’t know about R books, but I’m sure they exist. I prefer to use tutorials such as Data Mining with R, for example.

    @Matthias: I agree that R has some limitations, and maybe in some situations (very big data sets) it is not possible to use R.

    @Erik: That’s a very good point. In fact I have the same issue in using R in finance since I have to load all prices for a given time period and a set of stocks… in my case, this is not feasible under Windows (due to RAM limitations).

  9. Will Dwinnell on Fri, 23rd Jan 2009 2:34 am
  10. R is as convenient as Matlab

    Whoa! Let’s not say anything that we can’t take back! Heh heh…

    Actually, I am curious as to scalability. I see that someone else has mentioned a limitation in data size to physical RAM, but I wonder more about speed of computation. In my limited experience several years ago with S-Plus (R’s commercial cousin), performance on data sets I would consider small was abysmally slow. Can you characterize R’s performance on data tables whose size are typical of data mining projects?

    -Will Dwinnell
    Data Mining in MATLAB

  11. Sandro Saitta on Mon, 26th Jan 2009 9:32 am
  12. @Will: Thanks for your comment! What I meant by “R is as convenient as Matlab”, was in the programming point of view (I realized the sentence was not clear enough). It is easy to program in R and Matlab (compared to other languages). Of course, this is a very personal point of view.

    Regarding R’s performance, I have made no test up to now.

  13. Paolo on Tue, 27th Jan 2009 11:21 am
  14. @Sandro
    Regarding the performance issue (and more), the R-help mailing list can be very useful: see, for example, the thread starting here:
    http://tolstoy.newcastle.edu.au/R/e6/help/09/01/0138.html

  15. Sandro Saitta on Tue, 27th Jan 2009 5:04 pm
  16. @Paolo: Thanks for the link. This is an interesting discussion!

  17. Kirk Mettler on Fri, 17th Apr 2009 5:32 pm
  18. We have worked with a number of people using large data sets in R. However, a more universal solution to the “big data” problem in R is in the works.

  19. Sandro Saitta on Sat, 18th Apr 2009 12:04 am
  20. @Kirk Thanks for the information.

  21. QuantMinds » Recommendation on R on Sun, 7th Feb 2010 7:10 am

Tell me what you're thinking...





  • Swiss Association for Analytics

  • Most Popular Posts

  • T-shirts, Mugs & Mousepads


    All benefits given to a charity association
  • Data Mining Search Engine

    Supported by AnalyticBridge

  • Archives

  • Reading Recommandations