Comments on the Nine Laws of Data Mining

lawAfter reading the article from Tom Khabaza, I want to discuss some aspects of it with you. The article is in general nicely written and shows the experience of the author, however I do have comments for some of the laws.  In the first law, it is stated that that there is no data mining without business objective. While it is true most of the time, this is not always the case. In R&D, a data mining project can be started without clear business goal.

Since data mining may discover unexpected knowledge, there may be no defined objective at the beginning of the project. Later in the project, one can define the objective if specific trends has been found in the data for example. Clearly, there are two approaches for data mining in the company: top-down and bottom-up. The top-down approach is driven by business needs. The bottom-up approach is driven by the data. Both approaches can be complementary. When you are driven by the data, the business objective may come later. If you discover that there is no usable trend in the data, maybe there is no place for a project and thus no business objective. But there is still data mining.

In the second law, Khabaza states an excellent point about the importance to understand the business:

[…] whatever is found in the data has significance only when interpreted using business knowledge, and anything missing from the data must be provided through business knowledge.

In the fourth law, Khabaza explains that  problem formulation and resolution are both tasks for the data miner:

However, these views arise from the erroneous idea that, in data mining, the data miner formulates the problem and the algorithm finds the solution.  In fact, the data miner both formulates the problem and finds the solution – the algorithm is merely a tool which the data miner uses to assist with certain steps in this process

It means that the complete knowledge discovery process can’t be automated. The data miner has to formulate the problem, solve it and interpret the results. However, parts of the data mining process can still be automated (ETL, building the model, scoring, etc.)

Read the full article from Tom Khabaza.


Recommended Reading

Comments Icon3 comments found on “Comments on the Nine Laws of Data Mining

  1. hi sandro
    i have intermediate experience in data mining. how i can sure that i done correct data mining?
    is any sample and solution that i could test myself.

    plz mail your response

  2. I think Khabaza should not have used the word “law”. Facts, good practices, whatever else, but not “laws”. Establishing “laws” in a given domain sounds extremely arrogant, in particular considering that his assertions make sense in a given context (which is most likely the context he knows), but cannot be seen as absolute.

    The necessity of a defined objective, in particular, is a typical example. What does it mean? If you analyse data to extract knowledge without a defined objective, you cannot call that data mining? This makes on sense.

  3. In response to HT, I do not think that the Nine “Laws” refer to rules that much be followed, but rather features that experience shows are always present in a Data Mining project. Behaviors might be implied by the laws, but it is not a list a mandated behaviors.

    Is it Data Mining to extract “knowledge” without a defined objective? I believe that the answer is indeed ‘no’. You might have accomplished something, possibly even something very useful. Perhaps you have conducted a comprehensive report of data, but you are not engaging in the same activity that is called Data Mining in the article.

Comments are closed.