Guest Post: Can Database Developers do Data Mining ?

BrendanToday’s post has been written by Brendan Tierney, consultant and lecturer with the Dublin Institute of Technology in Ireland. He discusses the fact that DB developers are better suited for data mining than people with statistic background. I let you enjoy the reading and comment if you want. I also thank Brendan for his post on Data Mining Research.

Can Database Developers do Data Mining ?

Over the past 20 to 30 years Data Mining has been dominated by people with a background in Statistics. This is primarily due to the type of techniques employed in the various data mining tools. The purpose of this post is to highlight the possibility that database developers might be a more suitable type of person to have on a data mining project than someone with a statistics type background.

Lets take a look at the CRISP-DM lifecycle for data mining (Figure 1). Most people involved in data mining will be familiar with this life cycle.

crispFigure 1 – CRoss Industry Standard Process for Data Mining.

It is will documented that the first three steps in CRISP-DM can take up to 70% to 80% of the total project time. Why does it take so much time. Well the data miner has to start learning about the business in question, explore the data that exists, re-explore the business rules and understand etc. Then can they start the data preparation step.

Database developers within the organisation will have gathered a considerable amount of the required information because they would have been involved in developing the business applications. So a large saving in time can be achieved here as this will already have most of the business and data understanding. They are well equipped at querying the data, getting to the required data quicker. The database developers are also best equipped to perform the data preparation step.

If we skip onto the deployment step. Again the database developers will be required to implement/deploy the selected data mining model in the production environment.

The two remaining steps, Modelling and Evaluation, are perhaps the two steps that database developers are less suited too. But with a bit of training on Data Mining techniques and how to evaluate data mining models, they would be well able to complete the full data mining lifecycle.

If we take the stages of CRISP-DM that a database developer is best suited to, Business Understanding, Data Understanding, Data Preparation and Deployment, this would equate to approximately 80% to 85% of the total project. With a little bit of training and up skilling, database developers are the based kind of person to perform data mining within their organisation.

Bio

Brendan is a independent consultant and lecturer with the Dublin Institute of Technology in Ireland. Brendan has extensive experience working in the areas of Data Warehousing, Data Mining, Data Architecture and has worked on projects in Ireland, UK, Belgium and USA. Brendan was the first consultant hired by fraud detection software company Norkom Technologies (bought by BAE Systems in March 2011). During his time with Norkom he was involved in the delivery of their first projects in Ireland, Belgium and USA. Brendan has also worked with Deloitte Consulting and as a consultant with Oracle.

Web : www.comp.dit.ie/btierney

Blog : http://www.business-intelligence-quotient.com/

Share

Recommended Reading

Comments Icon7 comments found on “Guest Post: Can Database Developers do Data Mining ?

  1. I’d like to applaud your idea that statisticians should not be seen as the obvious choice for data miners. Statistics and data mining are two quite separate disciplines, and a strong focus on statistical analysis can distract from the correct focus of data mining, that is solving the business problem. The same is true of expertise in anything highly technical – I have seen neural networks experts have trouble with data mining, because they assumed that a business problem could be solved by tweaking the neural network.

    However, I would be cautious about assuming that, because the expertise of database developers is relevant to several aspects of data mining, database developers will therefore usually make good data miners. Their knowledge is certainly relevant, but that does not mean that database developers will naturally have the correct mindset for data mining. My experience of training data miners is that people with this sort of expertise often have trouble with understanding how data mining algorithms can help solve a business problem.

    I don’t think there is any simple answer to who can become a good data miner. I have seen database developers, business analysts, statisticians, AI experts, data quality experts and all sorts of general IT people become good data miners; I have also seen people from each of these backgrounds stumble when trying to get to grips with data mining. It is a question of being able to take on the right kind of business- and data-oriented analytical approach to a problem, and, as with any new skill-set, the willingness to take on a new way of thinking.

  2. @Tom: Thanks for your comprehensive answer. It’s I guess due to the fact that data mining is a domain which is at the intersection of AI, statistics, machine learning and databases.

  3. HI
    I like today’s post which is written by Brendan Tierney . He discusses the fact that DB developers are better suited for data mining than people with statistic background. i like these information and its help me allot .

  4. Disagree.

    DB consultants can rarely become happy data-miners. There are so many DB specialist and so few data-miners.

    When one becomes a specialist in a field, it’s hard to move to something completly different. Data-mining works with data too, but in completly different way. Probabilities instead of deterministic rules. Not technology but business is the king.

    I let my DB specialists friends to prepare data for me. Then do my magic to provide new insights and actionlable results. Fair deal, works for both of us and we never wanted to switch our jobs.

  5. sir, i am doing phd in data mining relating privacy preserving .
    where i collect the data from? please answer me through my email.
    let me give some ideas about datamining research.
    what are the existing research in datamining (pp)

  6. North of birmingham Struggle with, Incorporated. is usually an Yankee open-air item firm centering on coats, fleece, covers, tees, shoes and boots, plus gear these kinds of asbackpacks, camping tents, plus sleeping-bags. The particular dress plus gear lines are designed to wild smart, climbers, mountaineers, snowboarders, snowboarders, backpackers, plus resistence sports people. This business holds competent sports people within the oceans of maintaining, going up, ski plus snow boarding.

Comments are closed.