It is my pleasure to welcome on Data Mining Research, Thomas A. “Tony” Rathburn, a senior consultant at The Modeling Agency. I have recently read two of his articles about data mining. One of them, my favorite, is Data Mining the Financial Markets. He kindly accepted to answer four questions from Data Mining Research.
Data Mining Research: Who are you and how did you enter the data mining field?
Thomas Rathburn: I did my PhD work in Management Information Systems in the 1980’s and taught Computer Science and Statistics at Kent State for seven years. Most of my research was related to Artificial Intelligence. At the time, I also was doing some preliminary applied consulting work in the area… primarily with banks and insurance. In the early 90’s I took a position as Director of Training & Consulting with NeuralWare, in Pittsburgh, where I expanded into Finance and Marketing. I left that position a couple of years later to trade the 30-year Treasury Bond, and the corresponding futures and options contracts, on the Chicago Board of Trade with Lakeshore Trading. That was followed by a time with Hull Trading doing similar work with the SP500. I’ve been engaged in a general consulting practice modeling business applications of human behavior since. I currently teach for The Modeling Agency, Unica Software, SPSS, Group 1 software and The Data Warehousing Institute (TDWI), as well as doing direct consulting work for a number of clients and subcontract consulting for Capgemini and AT Kearney.
DMR: If you could give only one advice to someone starting a new data mining project, what would it be?
TR: Understand the differences between data mining and traditional statistical analysis. Stats is primarily concerned with measures of central tendency and conducts it’s modeling from that point of departure. While data mining uses similar techniques, the conceptualization of the problem is different. In data mining, I am concerned with sub-groups that display a behavior of interest at a rate different from the mean. In developing models that consistently and reliably identify these sub-groups, I am able to adapt my resource allocation strategies in a way that enhances performance.
First, and foremost, make sure your project definition and performance metrics are clearly and completely stated at the inception of the project. Absolutely everything that follows should be done to enhance performance as stated in your project definition.
Understand the differences between human behavior modeling and physical systems modeling.
Understand Low-Risk/High ROI project design and incremental development.
Understand that you don’t have to know every thing to enhance performance.
Understand that performance enhancement comes from enhanced project conceptualization and efficient utilization of data. Advanced mathematical techniques have minimal impact if you get those two things right.
DMR: Can you give examples of common pitfalls you encountered during your data mining projects?
TR: The single biggest issue is not understanding what data mining is… it’s analysis… to enhance performance… your specific metrics of success. It is not weird math to develop a magical solution.
The second issue is not appropriately completing project definition.
The third is not understanding how to extract the required information content from your data.
The fourth is learning algorithms and analytics rather than focusing on the reality of what you are trying to actually achieve and the goals in measuring that reality.
DMR: What is “The Modeling Agency” and what do you provide to your clients?
TR: The Modeling Agency is a group of senior level consultants, coordinated by Eric King, to provide training and consulting services to clients on technology projects. The best description is available through our website, or with direct conversations with Eric. His contact information is available on the website.
DMR: Thanks a lot for your answers.
For more information, you can visit The Modeling Agency website.