Readability of Decision Trees

One of the most often cited advantage of decision trees is their readability. Several data miners (to whom I belong) justify the use of this technique since it is quite easy to understand the obtained model (no black box). However, there are certain issues that make decision trees unreadable.

First, there is normalization (or standardization). In most projects, data have to be normalized before using decision tree. Therefore, once you plot the tree, values are meaningless. Of course, you can map the data back in the original format, but it has to be done.

Second is the number of trees. In the project I carry on at my job, I can have 100 or more decision trees by month (see this post for more details). It is clearly impossible to read all these trees even if they are independently understandable. The same happens with random forests. When there are 1000 trees voting for a given class, how can one understand the process (or rules) that produce the class output?

Decision trees still have a lot of advantages. However, the “readability” advantage must be taken with care. It may be valid in some applications, but can often be a mirage.


Recommended Reading

Comments Icon11 comments found on “Readability of Decision Trees

  1. Hi Sandro,

    I have a few thoughts.

    a) Hey, this bit;
    “In most projects, data have to be normalized before using decision tree. Therefore, once you plot the tree, values are meaningless.”
    – I reckon not necessarily true!

    Depends on your normalisation. You can normalise your data with meaning!

    I like binning into 100 buckets, each with the same numbers of occurrences (say, customer rows). I do this for a few reasons, one being that I can then report customers as being in the top 5% buckets etc. It is also an easy and fast way to rescale lots of data in SQL. CART or C5.0 models using this type of normalised data is actually quite easy to make sense of (eg, “if stock price is above 70% bucket” etc etc).

    b) Random forest doesn’t work well with big datasets (millions rows). I use fairly easy CART or C5.0. Sometimes I build a handful of models on subsets samples and average the models, but I’m not convinced hundred of models is the best way to go. I always take time creating new derived ‘information rich columns’ and using these as additional inputs to a decision tree or neural net.

    I agree with the problems you describe and, for those reasons you mention, I don’t follow the steps you describe. Maybe I’m jaded, but I believe Random Forests is a classic example of mad academia over practicality (and yes, I know that’s controversial considering the brilliant guy who created random forests…).

    – Tim

  2. Hi Tim,

    Thanks for your comment!

    1) Binning the data into buckets is a nice way to avoid this “unreadability” problem. I have always used normalization or standardization, but never used binning. Also the fact that you have the same number of occurrences in each bin avoid the issue of outliers.

    2) Regarding random forests, I definitely agree on the issues when using this technique (that’s why I don’t use random forests). However, I really like the concept of several models voting for the output class.

  3. I think random forest are both useful and powerful… with some caveats… you need lots of memory for big data (so not practical for all tasks) and readability is also a problem. Rattle solves the random forest readability issues by producing an importance chart.

  4. Random forests have the advantage of opening up problems that a single decision tree might not deal with well, such as when a class of interest is relatively rare. I prefer boosted trees for this situation, though.

  5. Hello Sandro,

    could you point me to some basis regarding the necessity of normalizing for decision trees? or how could it be accomplished? I know about the need to normalize data for some neural networks, but I thought this step it is not required for decision trees…

  6. @Lucian: Thanks for your comment. As written by Tim, decision tree don’t really need normalization (I think due to the fact that with entropy you work on probabilities). However, I usually prefer to work with normalized data. It is also safer if you later decide to use neural networks instead of decision tree.

    If, like me, you prefer to work with normalized data, then you can simple do a normalization or standardization process before using any data mining algorithm. You can see this post for more details.

  7. @santro sir i have a doudt …….why should we prefer decision trees still , though many advanced techniques ve been invented……….. wats the advantage of using decision trees for uncertain data……..

  8. @daniel: There are several advantages. Among them I see readability, easy interpretation of results, implicit feature selection and very few data preprocessing needed. Of course it depends on the application needs.

  9. Hi all, I have plan to implement random forest in imbalanced data for my final project in college. From several paper I’ve read, weight random forest is good enough for handling imbalanced data.
    I read here, that random forest is not good enough….
    Can i have reasons of this? And what about Weight Random forest that implement cost-sensitivity learning for handling imbalanced class.
    I’ll appreciate any ideas 🙂

Comments are closed.