Why is Matlab the best language for data mining? (cont’d)
In the previous post, I was arguing that Matlab is an excellent programming language (and environment) for data mining. However, as you know, no programming language is perfect. Matlab also has its drawbacks. Here are a few of them.
First, Matlab is an interpreted language. Which means no compilation. The good thing is the on the fly programmingexecuting aspect. On the other side, there is no declaration or type checking. This is normal since Matlab is not by definition a typed language. Or if you prefer, every element is a matrix. For example, the number 3 is stored as a 1×1 matrix containing the number 3. As there is no type checking, if you put by mistake ‘3’ as a string in your matrix, you will have no error and no warning.
Again, as there is no declaration, this situation can happen in Matlab:
myvariable = 0;
…
myVariable = 10*i;
…
disp(myvariable)
Indeed, since no declaration is needed, mistyping errors are dangerous in Matlab.
Another issue in Matlab is its execution time which is quite high in comparison to C++ or even Java. Of course one solution is to use the MEX interface with which you can directly call C/C++ code. However, the communication between Matlab and the C code takes time and it is generally slower then a direct C/C++ code.
Even with these limitations, I’m personally convinced that Matlab is a very powerful tool for data miners. The main reason is that you spend less time on the programming part and more on the problem your try to solve. If you think Matlab has other important drawbacks or on the contrary, if you think that the ones I mentioned are not really drawbacks, feel free to comment.
Note: Data Mining Research is on holidays until August, 2nd.
Comments
2 Comments on Why is Matlab the best language for data mining? (cont’d)

Will Dwinnell on
Mon, 23rd Jul 2007 8:46 pm

Anonymous on
Wed, 25th Jul 2007 9:36 am
For some time, MATLAB has been saddled with an undeserved reputation for being slow. I’ll make the following points:
1. Depends on application: If the code does things which MATLAB is good at (numerical array manipulation, for example), then MATLAB tends to be faster, sometimes even faster than handbuilt code in other “faster” languages. See, for instance, the answer given by Big Toe (Mtl) at:
2. Depends on code: Code which takes advantage MATLAB features such as vectorization will execute faster than otherwise. This can make an enormous difference.
3. Programmer time, readability and maintainability count, too. In a field like data mining, in which operations are performed across entire arrays of data, MATLAB can be easier to write, understand and modify. The linesofcode ratio from MATLAB to many procedural and OO languages has to be 10to1. Elimination of loops, alone, will make an enormous difference. Leveraging builtin MATLAB functions will improve programming on all of these counts.
In conclusion, I will say that, for some applications, MATLAB will not be the fastestexecuting choice, but this will not (nearly) always be the case.
Hey buddy! Nice blog that you maintain here.. I just chanced upon your blog surfing the blogosphere. I was thinking.. you could try out some interesting widgets on your page and spice it up with some great pictures. E.g try out the poster widget on http://www.widgetmate.com with your relevant keywords. It has some of the best images i have ever seen.
Tell me what you're thinking...