SAS for Data Mining
After having used Matlab and R for data mining, I am now using the SAS (Statistical Analysis System) solution. The software was chosen according to our client internal uses. SAS was already used in the company (a telecomunication company in Switzerland) and there were no reason to change. The first surprise with SAS is when you install it. Or should I say “them”. Indeed SAS contains dozens of software and several of them can be used for the same purpose. For example, when it goes to data preprocessing and data aggregation, you could use Data Integration Studio (DI Studio), Enterprise Guide (EG) or SAS Base. One of the first challenge of SAS is to find the right tool for you according to the tasks you have to perform.
DI Studio is a drag and drop interface that allows you to preprocess and query your data. The user interface is quite old and when it comes to programming structures such as loops, it’s quite complicated. Also it is not straightforward if you want to perform actions that are not in the drag and drop interface. For more flexibility, you can use SAS Base. No advanced GUI. Just a programming language based mainly on DATA and PROC steps. You also have the possibility to write programs that write their own code through MACRO. Thus SAS Base is a powerful programming language that allows you to perform any task you need. EG is a GUI to SAS. Most preprocessing, data visualization and basic statistics can be done in a drag and drop mode with EG. The interesting aspect with EG is that you can add SAS code from SAS Base as well as DI Studio. For more advanced data mining functionnalities (neural networks, SVM, etc.) you need Enterprise Miner (EM). EM is also a drag and drop sowftare where you can build your data mining projects. Usually, input data sets in EM will be output data sets from DI Studio, EG or SAS Base.
The second challenge with SAS is the installation and configuration. If you work in a company that has its own IT department, then everything is fine. If you have to install SAS tools by yourself for your own PC or laptop, then it’s a bit more complicated. First SAS is based on a client/server approach. For example, even if you need DI Studio locally, you will have to install servers, new users, management console and several fixes (available on SAS support website). By the way, the SAS support team is of excellent quality, really. I interact with them one or two times a week. Their answer is fast and they are very professional. It’s really a pleasure to have professional answering your questions so fast (which is usually not the case with Matlab or R). So, of course, free solutions such as R and Java have their own advantage, but once installed and configured correctly, SAS is an easy to learn and powerful tool for data mining.
For more information about SAS:
The official SAS website
The SAS support page where you can submit a problem
The discussion group comp.sys-soft.sas
The SUGI list of technical papers about SAS
The Little SAS Book (I will review it soon)
The SASCOM magazine (free)
If you have other interesting resources or if you would like to give your opinion about SAS, feel free to post a comment.