Java data mining

October 23, 2006 by
Filed under: Java, Matlab 

Are you interested in data mining? Yes… you are reading this blog. Do you use to program in Java? If yes, then the book Java Data Mining can interest you. It is briefly described by KDnuggets. To my point of view, Java is perhaps not the best language to use for data mining. Either you are in the industry and need a fast running application; then you will certainly use C++ or .NET. Or you are doing research and you need more interactivity and simplicity while coding; then you will probably use MATLAB for example. Java is neither as fast as C++, nor as easy to use as MATLAB. Perhaps this book will tell you why to use Java for data mining. By the way, a good book on data mining (with Java examples) is Data Mining: Practical Machine Learning Tools and Techniques.

Share

Comments

11 Comments on Java data mining

  1. Ralf on Tue, 31st Oct 2006 1:42 pm
  2. There are two freely available open-source data mining suites implemented in Java: WEKA and YALE. YALE comes with an easy to use graphical user interface (GUI), but can also be used from the command line or as a library by your own programs. YALE provides more than 400 data mining operators and fully integrates WEKA. For further details see http://yale.sf.net/

    For more open-source software for data mining, you may want to check the corresponding lists at Wikipedia or KDnuggets.

    Best regards,
    Ralf

  3. Sandro Saitta on Tue, 31st Oct 2006 2:42 pm
  4. Hi Ralf,

    Thanks for the comment. I did know WEKA but not YALE. I think WEKA is well known in the data mining community. As a researcher, I prefer to work with Matlab. However, I’m quite sure that WEKA and YALE are more useful to data mining practitioners.

  5. Ralf on Tue, 31st Oct 2006 4:37 pm
  6. Hi Sandro,

    WEKA is better known, because it is the older project and up to now also more wide-spread. As far as I know, WEKA started some time around 1998, while YALE started in 2001. However, by now YALE is far more comprehensive than WEKA as far as the flexibility of the experimental setup and the number of available operators is concerned.

    As far as the wide-spread use of WEKA and YALE is concerned, YALE is catching up quickly. This month, YALE has already had more than 16.000 downloads, as counted by the SourceForge.net download statistics:
    YALE download statistics.

    WEKA and YALE are both used in academia for research and teaching as well as in industry for research, development, and applications. I know, that there are many researchers with a personal preference for R or MatLab, especially those with a background in Mathematics, Statistics, or Physics. Nonetheless there also many researchers which prefer WEKA and YALE for their work. Anyway, we live in a free world and everybody should be free to choose his favorite tool(s).

    WEKA and YALE try to address both, researchers and practitioners. By the way, as you can tell from
    my home page
    , I am also a researcher. ;^)

    Best regards,
    Ralf

  7. Sandro Saitta on Wed, 1st Nov 2006 5:17 pm
  8. As you have perhaps seen, I have added a post about our discussion.

  9. Ralf on Fri, 3rd Nov 2006 11:49 pm
  10. Hi Sandro,

    yes, I saw your post about our discussion. Thanks for your blog entry about YALE.

    The WEKA project started even earlier than I had thought, in 1993. So the WEKA project started already 13 years ago.

    The YALE project started in 2001, i.e. only about 5 years ago.

    The R project and its R programming language is another widely used open-source data mining tool with a large and active user community. There is also an alternative graphical user interface (GUI) for R called Rattle.
    However, R is not implemented in Java, like WEKA and YALE, and hence may be a little bit of topic here (for a blog entry on Java Data Mining).
    Most of the user-visible functions in R are written in R, an interpreted language. It is possible for the user to interface to procedures written in the C, C++, or FORTRAN languages for efficiency.

    Best regards,
    Ralf

  11. Ralf on Fri, 3rd Nov 2006 11:58 pm
  12. Besides of Rattle, there are many alternative GUIs for R integrating the R programming language.

  13. Sandro Saitta on Sun, 5th Nov 2006 5:48 pm
  14. Ralf,

    Thanks for all these information. I’m sure it can help people choose (or change?) the programming language used for data mining.

  15. Will Dwinnell on Thu, 9th Nov 2006 12:13 am
  16. I;d just like to say, as a practitioner, that MATLAB is my tool of choice for statistical and datamining work. Although I do research on my own, I am not an academic. For the past 4 years, I’ve been working for a bank.

  17. meilleur site de jeux de casino on Wed, 7th Apr 2010 11:34 am
  18. The main focus of JDMP lies on a consistent data representation. Maybe you’ve heard that, for Linux everything is a file. For JDMP, everything is a Matrix! Well, not everything, but many objects do have a matrix representation.
    meilleur site de jeux de casino

  19. meilleur site de jeux de casino on Wed, 7th Apr 2010 11:35 am
  20. The main focus of JDMP lies on a consistent data representation. Maybe you’ve heard that, for Linux everything is a file. For JDMP, everything is a Matrix! Well, not everything, but many objects do have a matrix representation…
    meilleur site de jeux de casino

  21. jeux de keno en ligne on Wed, 7th Apr 2010 11:37 am
  22. The main focus of JDMP lies on a consistent data representation. Maybe you’ve heard that, for Linux everything is a file. For JDMP, everything is a Matrix! Well, not everything, but many objects do have a matrix representation…
    jeux de keno en ligne

Tell me what you're thinking...





  • Swiss Association for Analytics

  • Most Popular Posts

  • T-shirts, Mugs & Mousepads


    All benefits given to a charity association
  • Data Mining Search Engine

    Supported by AnalyticBridge

  • Archives

  • Reading Recommandations