What is a good classification accuracy in data mining?

What a good question! Or what a bad question should I say. In fact, this question is not a good one since if we ask it this way, we might expect an answer that is valid for any data mining problem. This is of course not possible. This question may be asked by a data miner, since it’s one way of measuring the quality of the data mining algorithm.  Indeed, you can estimate how good your decision tree or neural networks are by estimating the classification rate of the test set. My point in this article is to highlight the fact that the classification percentage depends on the application in which data mining is used.

Let me explain that with a few examples from my own experience. I have a friend working in the domain of face recognition. According to him, an algorithm (machine learning in his case) is well fitted to the problem when you get a classification accuracy above 97% for example. This may be true, but only in his domain, which is face recognition. In this domain, you apply machine learning to pictures to recognize faces. In this case, you have no outside effect or variables that could influence the output (the class you predict) which is not present in the pixels of the picture. Thus, a very high classification accuracy can be reached. Don’t get me wrong, I’m not saying that face recognition is an easy task, rather that with the correct algorithm and the right data preparation, a very high classification rate can be reached.

Let’s take another application: predicting user clicks on some given ads. That’s the current application I’m working on with the FinWEB project. In this case, most of my models reach a classification accuracy of around 70%. Is that bad? Well, according to the application domain, not really. When we predict if the user will click or not on the ad, we don’t have all possible information at our disposal. We only have some data that represent his behavior in a given time frame. We don’t have all the user brain in a data base. There are so many influencing factors, that it is quite satisfying to reach a classification percentage of 70%.

Finally, I will take the example of data mining in finance. When applying data mining to the problem of stock picking, I obtained a classification accuracy range of 55-60%. While it looks to be a poor result, it’s not. We should consider all the influencing factors that can affect the price of a stock. While we may use hundreds of input parameters, they may only represent a very small percentage of all information that could influence the price of a stock. This is very far from the face recognition case with every pixel defined.

My point in this post was to show that there is no definitive answer to this question, which is in fact not a good one. The classification accuracy mainly depends on the application domain. Feel free to share your own experiences by commenting this post!

Share

Recommended Reading

Comments Icon7 comments found on “What is a good classification accuracy in data mining?

  1. Hey Sandro, Interesting post. I think that this is one of the challenges I have faced when presenting a model (results) to a non technical audience. “You were only correct 70% of the time!?”. You do need to explain the domain, the lift and the profit (if applicable) that comes from that improvement immediately when showing the accuracy.

    I would add a couple additional thoughts:

    1) Just accuracy alone is not always enough – think of a *rare* event where prediction of ‘false’ is nearly always correct – but not useful. There, true positives, false positives etc. become key

    2) In some domains, like database marketing, also with a rare event, you are less interested in classification accuracy per say as in the predicted probability and the lift of top deciles.

    My 2 cents
    Jeff

  2. Sir,
    I am doing my PG project related to credit scoring.

    I am obtaining the accuracy rate of around 70%
    and the precision rate of around 85%

    Is it the good classification range.

    Thankyou

  3. Great Article regarding Classification Accuracy. I want to add/suggest something useful formula in this…

    Classification Accuracy = Number of correctly classified testig examples/ Number Of testing Examples

  4. I know the formula for accuracy but i dont know how calculate in practicaly. please explain with example table

  5. very interesting subject to discuss. We can’t say that 70% or 80% of accuracy is good all the time. As what was said in the above, there are many external factors that we can’t control. For example, in student assessment prediction, there are several factors affect the accuracy; the material changes, teacher, student intellegence level ,learning policy…etc., and no one of them appear clearly in the dataset.

    Accuracy of 70% may be good for prediction students achievments, but it is very poor in medical field, military issues. This is why the accuracy acceptance varies from field to field.

    In few words, it depends on the environment.

Comments are closed.