Readability of Decision Trees
One of the most often cited advantage of decision trees is their readability. Several data miners (to whom I belong) justify the use of this technique since it is quite easy to understand the obtained model (no black box). However, there are certain issues that make decision trees unreadable.
First, there is normalization (or standardization). In most projects, data have to be normalized before using decision tree. Therefore, once you plot the tree, values are meaningless. Of course, you can map the data back in the original format, but it has to be done.
Second is the number of trees. In the project I carry on at my job, I can have 100 or more decision trees by month (see this post for more details). It is clearly impossible to read all these trees even if they are independently understandable. The same happens with random forests. When there are 1000 trees voting for a given class, how can one understand the process (or rules) that produce the class output?
Decision trees still have a lot of advantages. However, the “readability” advantage must be taken with care. It may be valid in some applications, but can often be a mirage.
Comments
10 Comments on Readability of Decision Trees
-
Tim Manns on
Fri, 28th Nov 2008 6:10 am
-
Sandro Saitta on
Fri, 28th Nov 2008 2:31 pm
-
Shane Butler on
Sun, 30th Nov 2008 11:30 pm
-
Sandro Saitta on
Mon, 1st Dec 2008 5:12 pm
-
James Pearce on
Thu, 8th Jan 2009 1:07 am
-
Sandro Saitta on
Thu, 8th Jan 2009 5:05 pm
-
Lucian on
Tue, 24th Nov 2009 11:17 am
-
Sandro Saitta on
Sat, 28th Nov 2009 11:44 am
-
Daniel on
Sun, 25th Sep 2011 4:37 am
-
Sandro Saitta on
Mon, 3rd Oct 2011 8:18 am
Hi Sandro,
I have a few thoughts.
a) Hey, this bit;
“In most projects, data have to be normalized before using decision tree. Therefore, once you plot the tree, values are meaningless.”
– I reckon not necessarily true!
Depends on your normalisation. You can normalise your data with meaning!
I like binning into 100 buckets, each with the same numbers of occurrences (say, customer rows). I do this for a few reasons, one being that I can then report customers as being in the top 5% buckets etc. It is also an easy and fast way to rescale lots of data in SQL. CART or C5.0 models using this type of normalised data is actually quite easy to make sense of (eg, “if stock price is above 70% bucket” etc etc).
b) Random forest doesn’t work well with big datasets (millions rows). I use fairly easy CART or C5.0. Sometimes I build a handful of models on subsets samples and average the models, but I’m not convinced hundred of models is the best way to go. I always take time creating new derived ‘information rich columns’ and using these as additional inputs to a decision tree or neural net.
I agree with the problems you describe and, for those reasons you mention, I don’t follow the steps you describe. Maybe I’m jaded, but I believe Random Forests is a classic example of mad academia over practicality (and yes, I know that’s controversial considering the brilliant guy who created random forests…).
– Tim
Hi Tim,
Thanks for your comment!
1) Binning the data into buckets is a nice way to avoid this “unreadability” problem. I have always used normalization or standardization, but never used binning. Also the fact that you have the same number of occurrences in each bin avoid the issue of outliers.
2) Regarding random forests, I definitely agree on the issues when using this technique (that’s why I don’t use random forests). However, I really like the concept of several models voting for the output class.
I think random forest are both useful and powerful… with some caveats… you need lots of memory for big data (so not practical for all tasks) and readability is also a problem. Rattle solves the random forest readability issues by producing an importance chart.
Thanks for your contribution Shane!
Random forests have the advantage of opening up problems that a single decision tree might not deal with well, such as when a class of interest is relatively rare. I prefer boosted trees for this situation, though.
Thanks for your comment James!
Hello Sandro,
could you point me to some basis regarding the necessity of normalizing for decision trees? or how could it be accomplished? I know about the need to normalize data for some neural networks, but I thought this step it is not required for decision trees…
@Lucian: Thanks for your comment. As written by Tim, decision tree don’t really need normalization (I think due to the fact that with entropy you work on probabilities). However, I usually prefer to work with normalized data. It is also safer if you later decide to use neural networks instead of decision tree.
If, like me, you prefer to work with normalized data, then you can simple do a normalization or standardization process before using any data mining algorithm. You can see this post for more details.
@santro sir i have a doudt …….why should we prefer decision trees still , though many advanced techniques ve been invented……….. wats the advantage of using decision trees for uncertain data……..
@daniel: There are several advantages. Among them I see readability, easy interpretation of results, implicit feature selection and very few data preprocessing needed. Of course it depends on the application needs.
Tell me what you're thinking...
















