Java data mining
Are you interested in data mining? Yes… you are reading this blog. Do you use to program in Java? If yes, then the book Java Data Mining can interest you. It is briefly described by KDnuggets. To my point of view, Java is perhaps not the best language to use for data mining. Either you are in the industry and need a fast running application; then you will certainly use C++ or .NET. Or you are doing research and you need more interactivity and simplicity while coding; then you will probably use MATLAB for example. Java is neither as fast as C++, nor as easy to use as MATLAB. Perhaps this book will tell you why to use Java for data mining. By the way, a good book on data mining (with Java examples) is Data Mining: Practical Machine Learning Tools and Techniques.
Comments
11 Comments on Java data mining

Ralf on
Tue, 31st Oct 2006 1:42 pm

Sandro Saitta on
Tue, 31st Oct 2006 2:42 pm

Ralf on
Tue, 31st Oct 2006 4:37 pm

Sandro Saitta on
Wed, 1st Nov 2006 5:17 pm

Ralf on
Fri, 3rd Nov 2006 11:49 pm

Ralf on
Fri, 3rd Nov 2006 11:58 pm

Sandro Saitta on
Sun, 5th Nov 2006 5:48 pm

Will Dwinnell on
Thu, 9th Nov 2006 12:13 am

meilleur site de jeux de casino on
Wed, 7th Apr 2010 11:34 am

meilleur site de jeux de casino on
Wed, 7th Apr 2010 11:35 am

jeux de keno en ligne on
Wed, 7th Apr 2010 11:37 am
There are two freely available opensource data mining suites implemented in Java: WEKA and YALE. YALE comes with an easy to use graphical user interface (GUI), but can also be used from the command line or as a library by your own programs. YALE provides more than 400 data mining operators and fully integrates WEKA. For further details see http://yale.sf.net/
For more opensource software for data mining, you may want to check the corresponding lists at Wikipedia or KDnuggets.
Best regards,
Ralf
Hi Ralf,
Thanks for the comment. I did know WEKA but not YALE. I think WEKA is well known in the data mining community. As a researcher, I prefer to work with Matlab. However, I’m quite sure that WEKA and YALE are more useful to data mining practitioners.
Hi Sandro,
WEKA is better known, because it is the older project and up to now also more widespread. As far as I know, WEKA started some time around 1998, while YALE started in 2001. However, by now YALE is far more comprehensive than WEKA as far as the flexibility of the experimental setup and the number of available operators is concerned.
As far as the widespread use of WEKA and YALE is concerned, YALE is catching up quickly. This month, YALE has already had more than 16.000 downloads, as counted by the SourceForge.net download statistics:
YALE download statistics.
WEKA and YALE are both used in academia for research and teaching as well as in industry for research, development, and applications. I know, that there are many researchers with a personal preference for R or MatLab, especially those with a background in Mathematics, Statistics, or Physics. Nonetheless there also many researchers which prefer WEKA and YALE for their work. Anyway, we live in a free world and everybody should be free to choose his favorite tool(s).
WEKA and YALE try to address both, researchers and practitioners. By the way, as you can tell from
my home page, I am also a researcher. ;^)
Best regards,
Ralf
As you have perhaps seen, I have added a post about our discussion.
Hi Sandro,
yes, I saw your post about our discussion. Thanks for your blog entry about YALE.
The WEKA project started even earlier than I had thought, in 1993. So the WEKA project started already 13 years ago.
The YALE project started in 2001, i.e. only about 5 years ago.
The R project and its R programming language is another widely used opensource data mining tool with a large and active user community. There is also an alternative graphical user interface (GUI) for R called Rattle.
However, R is not implemented in Java, like WEKA and YALE, and hence may be a little bit of topic here (for a blog entry on Java Data Mining).
Most of the uservisible functions in R are written in R, an interpreted language. It is possible for the user to interface to procedures written in the C, C++, or FORTRAN languages for efficiency.
Best regards,
Ralf
Besides of Rattle, there are many alternative GUIs for R integrating the R programming language.
Ralf,
Thanks for all these information. I’m sure it can help people choose (or change?) the programming language used for data mining.
I;d just like to say, as a practitioner, that MATLAB is my tool of choice for statistical and datamining work. Although I do research on my own, I am not an academic. For the past 4 years, I’ve been working for a bank.
The main focus of JDMP lies on a consistent data representation. Maybe you’ve heard that, for Linux everything is a file. For JDMP, everything is a Matrix! Well, not everything, but many objects do have a matrix representation.
meilleur site de jeux de casino
The main focus of JDMP lies on a consistent data representation. Maybe you’ve heard that, for Linux everything is a file. For JDMP, everything is a Matrix! Well, not everything, but many objects do have a matrix representation…
meilleur site de jeux de casino
The main focus of JDMP lies on a consistent data representation. Maybe you’ve heard that, for Linux everything is a file. For JDMP, everything is a Matrix! Well, not everything, but many objects do have a matrix representation…
jeux de keno en ligne
Tell me what you're thinking...