I recently had a short discussion on my blog about data mining languages with Ralf Klinkenberg, a data mining researcher. We briefly discussed about using Matlab, WEKA or YALE for data mining. As a researcher, I use Matlab for data mining. The problem with a software such as WEKA, to my mind, is the difficulty to implement your own functions. WEKA, I think, is very good for applying data mining techniques on your data set. However, but I’m perhaps wrong, I think with Matlab it is easier to directly reuse existing functions, modify them or even create your own functions and combine them with existing ones.

Ralf, as one of the developer of YALE, seems to prefer it to Matlab. Since I’m curious about preferences concerning data mining languages, feel free to answer to this post by telling what kind of language do you use for data mining as well as your domain (industry/research/teaching). Examples of categories are:

- Maltab, Python, etc.
- WEKA, YALE, etc.
- C, C++, etc.
- Another category

I’m in general a MatLab fan, but I think Weka is highly appropriate for data mining tasks and i use it without second thought on all my relevant projects (I’m not yet familiar enough with Yale)

I’m happy to see I’m not the only Matlab fan ðŸ™‚

By the way, when I write that Matlab is more research oriented I mean that people doing research

indata mining will perhaps prefer Matlab. However, all the people doing researchwithor using data mining will certainly chose WEKA or YALE.I am not using R myself but I am surprised that it hasn’t been mentioned.

http://www.r-project.org/

I am a scientific researcher in data mining (at a university) and a practitioner applying data mining as consultant and software developer (as freelancer). In both roles, I use YALE, WEKA, and Java to implement my solutions and new methods. YALE is easily extendable. You can write your own operators or plugins in Java. The YALE tutorial, which is available online, describes how to do this.

Reusability of existing methods, ease of combining existing and new methods, and rapid prototyping are really strong reasons for using YALE.

The post about Java data mining has some comments about the R language if needed.

I’m a big supporter of OpenSource software and so I’m personally anti Matlab.

I prefer to roll my own in C and C++.

Datamining on large db is runtime intensive and personally can’t tolerate anything slower.

I definitely agree with you about runtime efficiency. I’m working with datasets in the range of 1000 entries. For this, Maltab is usually alright. When it takes too long with Matlab, I write functions in C (and use the interface with Matlab). Now if you work with huge databases, an alternative to Matlab should perhaps be used.