Today’s post has been written by Brendan Tierney, consultant and lecturer with the Dublin Institute of Technology in Ireland. He discusses the fact that DB developers are better suited for data mining than people with statistic background. I let you enjoy the reading and comment if you want. I also thank Brendan for his post on Data Mining Research.
Can Database Developers do Data Mining ?
Over the past 20 to 30 years Data Mining has been dominated by people with a background in Statistics. This is primarily due to the type of techniques employed in the various data mining tools. The purpose of this post is to highlight the possibility that database developers might be a more suitable type of person to have on a data mining project than someone with a statistics type background.
Lets take a look at the CRISP-DM lifecycle for data mining (Figure 1). Most people involved in data mining will be familiar with this life cycle.
It is will documented that the first three steps in CRISP-DM can take up to 70% to 80% of the total project time. Why does it take so much time. Well the data miner has to start learning about the business in question, explore the data that exists, re-explore the business rules and understand etc. Then can they start the data preparation step.
Database developers within the organisation will have gathered a considerable amount of the required information because they would have been involved in developing the business applications. So a large saving in time can be achieved here as this will already have most of the business and data understanding. They are well equipped at querying the data, getting to the required data quicker. The database developers are also best equipped to perform the data preparation step.
If we skip onto the deployment step. Again the database developers will be required to implement/deploy the selected data mining model in the production environment.
The two remaining steps, Modelling and Evaluation, are perhaps the two steps that database developers are less suited too. But with a bit of training on Data Mining techniques and how to evaluate data mining models, they would be well able to complete the full data mining lifecycle.
If we take the stages of CRISP-DM that a database developer is best suited to, Business Understanding, Data Understanding, Data Preparation and Deployment, this would equate to approximately 80% to 85% of the total project. With a little bit of training and up skilling, database developers are the based kind of person to perform data mining within their organisation.
Brendan is a independent consultant and lecturer with the Dublin Institute of Technology in Ireland. Brendan has extensive experience working in the areas of Data Warehousing, Data Mining, Data Architecture and has worked on projects in Ireland, UK, Belgium and USA. Brendan was the first consultant hired by fraud detection software company Norkom Technologies (bought by BAE Systems in March 2011). During his time with Norkom he was involved in the delivery of their first projects in Ireland, Belgium and USA. Brendan has also worked with Deloitte Consulting and as a consultant with Oracle.
Web : www.comp.dit.ie/btierney