Data Mining with R
There exists many solutions to implement your data mining algorithms: C/C++, Java, Matlab, R, etc. Any of these languages have their advantages and drawbacks. In this post, I will consider R, since it is the one I am using at my job. Here is a list of the reasons why R is a good choice for data mining:
- It is fast (at least faster than Matlab)
- It is completely free
- Several data mining algorithms are available as packages (such as rpart)
- It contains several libraries for graphics
A comprehensive document about using R for data mining has been written by Luis Torgo at University of Porto, Portugal. The document of 125 pages contains two case studies, one of which is in finance. It is thus a very good introduction for people using R for data mining.
R has also several drawbacks. Out of the lack of information regarding errors, the most important drawbacks are certainly the ones that concern the conception of its basic structures such as vector/matrix, incremental loop, etc. You can have details about R design flaws at Radford Neal’s blog.