While starting a new project a few days ago, I had to answer the recurrent question: What language do I choose? In research, we have the opportunity of choosing any language, free or not. This is usually not the case in industry where the language can be fixed for many reasons (price, customer choice, boss choice, same as existing system, etc.).
I basically had to choose between Java and Matlab (C++ was soon deleted from my list since I don’t like to spend time on pointers and manually free up the memory, but this is very personal). Of course a lot of others are available, but I feel more confident with these two. As most of my work was done with Matlab, I decided to start with Java. Contradictory? Not at all, I just wanted to know how easy it was to use Java for raw data mining tasks (i.e. without using JDM framework or such).
When doing data mining, a large part of the work is to manipulate data. Indeed, the part of coding the algorithm can be quite short since Matlab has a lot of toolboxes for data mining. And when manipulating data, Matlab is definitely better. It is normal since it is done to work with matrices (MATrix LABoratory). Thus, deleting a row, a column, transposing a matrix, calculating the determinant… all these can be done in one line of code. To my knowledge, this is not the case with Java, but if you know some way, feel free to comment.
For more information about using Matlab for data mining, the best place is Will’s blog. In the next post, I will write about the other side of the coin and explain some of Matlab’s drawbacks.