SAS for Data Mining

sas-logolg2After having used Matlab and R for data mining, I am now using the SAS (Statistical Analysis System) solution. The software was chosen according to our client internal uses. SAS was already used in the company (a telecomunication company in Switzerland) and there were no reason to change. The first surprise with SAS is when you install it. Or should I say “them”. Indeed SAS contains dozens of software and several of them can be used for the same purpose. For example, when it goes to data preprocessing and data aggregation, you could use Data Integration Studio (DI Studio), Enterprise Guide (EG) or SAS Base. One of the first challenge of SAS is to find the right tool for you according to the tasks you have to perform.

DI Studio is a drag and drop interface that allows you to preprocess and query your data. The user interface is quite old and when it comes to programming structures such as loops, it’s quite complicated. Also it is not straightforward if you want to perform actions that are not in the drag and drop interface. For more flexibility, you can use SAS Base. No advanced GUI. Just a programming language based mainly on DATA and PROC steps. You also have the possibility to write programs that write their own code through MACRO. Thus SAS Base is a powerful programming language that allows you to perform any task you need. EG is a GUI to SAS. Most preprocessing, data visualization and basic statistics can be done in a drag and drop mode with EG. The interesting aspect with EG is that you can add SAS code from SAS Base as well as DI Studio. For more advanced data mining functionnalities (neural networks, SVM, etc.) you need Enterprise Miner (EM). EM is also a drag and drop sowftare where you can build your data mining projects. Usually, input data sets in EM will be output data sets from DI Studio, EG or SAS Base.

The second challenge with SAS is the installation and configuration. If you work in a company that has its own IT department, then everything is fine. If you have to install SAS tools by yourself for your own PC or laptop, then it’s a bit more complicated. First SAS is based on a client/server approach. For example, even if you need DI Studio locally, you will have to install servers, new users, management console and several fixes (available on SAS support website). By the way, the SAS support team is of excellent quality, really. I interact with them one or two times a week. Their answer is fast and they are very professional. It’s really a pleasure to have professional answering your questions so fast (which is usually not the case with Matlab or R). So, of course, free solutions such as R and Java have their own advantage, but once installed and configured correctly, SAS is an easy to learn and powerful tool for data mining.

For more information about SAS:

The official SAS website
The SAS support page where you can submit a problem
The discussion group
The SUGI list of technical papers about SAS
The Little SAS Book (I will review it soon)
The SASCOM magazine (free)

If you have other interesting resources or if you would like to give your opinion about SAS, feel free to post a comment.


Recommended Reading

Comments Icon10 comments found on “SAS for Data Mining

  1. You said: “By the way, the SAS support team is of excellent quality, really. I interact with them one or two times a week.”

    Wow! One or two times a week? Is there something wrong with the product?

    I agree on the difficulty on installing it. It is indeed difficult. You need to spend hours and hours getting it on your machine.

  2. Thanks for your comment Ashutosh. In my case, several interactions with the support team are due to the installation and configuration of the product. After that, most questions are very specific and concern either errors (yes, there are a lot of possible errors) or functionalities (for example, how can I add some text to a EM schema, this is not yet possible in fact). It is also true that SAS is not the most easy-to-use tool for data mining, but it’s definitely powerful.

    I also have to admit that sometimes it is easier to ask questions instead of looking for hours in the SAS help or on the Web 😉

  3. You forgot to mention cost 🙂

    If I am not mistaken, to do any data mining you will have to buy the SAS/STAT addon… this only gets you stats and a few types of regression. For decision trees, neural nets, etc you need EM which has a hefty price tag! I guess another alternative might be JMP if you want to keep it in the SAS family… not 100% sure of its features though.

  4. Cost :0

    Also baring in mind, only one man in the world owns SAS software; Mr Jim Goodnight. All the customers rent it for a hefty price *every* year. Don’t pay your license, can’t even run existing analysis. Everything stops. If you have a solution you have purchased, then you simply don’t buy the new features and can use the ‘old’ software as long as you want. Of course that often has little importance to us data analysts that use it in large organisations, but its worth baring in mind.

    Most of my work is using Clementine, but actually it is the data warehouse (Teradata) that stores the data and does the data processing. Clementine turns everything into SQL. When I do usually talk to SAS users they always describe a system of data extraction out of a data mart or warehouse and import into SAS. In my mind this is a nasty overhead. I know that more recently SAS has better data connectivity and SAS into SQL transforms, but it rarely seems used by anyone (or maybe they don’t like to boast as much 🙂

    I’ve love to know what kind of set-up you have regarding data storage and scoring etc.



  5. “Open Source” ?

    A (maybe dump) question:
    Are you able to see the source code of the algorithmns (e.g. decision tree) already delivered with the product ?

    kind regards,


  6. @Shane: You’re right! Costs! Well in fact I was like Tim, in the “data analysts” point of view 🙂
    With SAS Base and EM you can do usual DM tasks. You don’t need SAS/STAT (at least with the SAS 9.1 version). Or was it installed by default with SAS Base?

    @Tim: Good point about the license. Regarding the SAS, the good point is that it is SAS in fact. It is so spread that nearly any data-related software will have a “import SAS data set” option. For example, I can pre-process my data using SAS and then use Insightful Miner – to mine – or Tibco Spotfire – to play with – my data.

    @Steffen: I have never tried, but I don’t think so. If you want to self-tune the code itself, I guess it’s better to use YALE, WEKA or R 🙂

Comments are closed.