Data Mining Research (DMR): Could you introduce yourself to the readers of dataminingblog.com?
Dr. A. Fazel Famili (FF): I was as a Research Scientist at the National Research Council of Canada (NRC) for 30 years (Oct 1, 1984-Jan 16, 2015). Prior to joining NRC, I worked in industry for 3 years. I have a strong data mining and bioinformatics expertise and at my previous positions, I was engaged in unique collaborative research and development in Machine Learning, Data Mining and Knowledge Management for Engineering Systems (Analytics for Sensor Based data) and Life Sciences. The range of application domains that I have worked extensively vary from manufacturing to complex equipment (mobile and stationary, aerospace and extensive work in life sciences. I have edited two books, published over 50 articles in Data Mining and Artificial Intelligence and I have a US Data Mining patent. I have lectured in a number of Institutes in Canada, Europe, Far East, South Africa and South America. I am also an adjunct professor at the School of Electrical Engineering and Computer Science, (University of Ottawa) and founding Editor-in-Chief of the IDA Journal (Intelligent Data Analysis, established in 1996, published bi-monthly). I am now a Data Analytics Consultant (www.ida-ij.com/famili). You can reach me at firstname.lastname@example.org.
DMR: How would you classify data sources for data mining?
FF: There are several ways to look at this. One way is to divide the world into; (i) Physical/ Engineered Systems and (ii) Life Sciences/Nature. In the first category (i.e. Physical / Engineered Systems), these are systems that did not exist before!. Therefore, we as human beings have created them, the data and it’s contents for these systems are supposed to be known to us (not hidden patterns!). In addition, the root model behind the phenomena for which we have access to their data is mostly known. Examples of this category are complex equipment (sensor based stationary or mobile systems), financial systems, etc. However, in the second category (i.e. Life Sciences and Nature), these are systems that have existed in one form or the other, there are many unknowns in this category, even for data, when, where, what to measure and how?! Examples of this category are: Human life as one species, one object in the universe! Space is also in this category.
DMR: What is the biggest challenge(s) you have faced in the field?
FF: In the early days of data mining work, especially applied data mining projects, our challenge was to educate owners of data what data mining was, what would be the advantages of companies opening the door and providing us with their data and access to real-world data. Nowadays, one of the biggest challenges is to make sure that we have access to and understand (properly choose) the attributes that influence the problem that we are investigating. For example, if we our goal is to predict the potential failure of a component in a complex system for which we have access to hundreds of parameters, unless the particular parameter(s) that are associated with the problem are included in our data (e.g. sensor measurements), there is no guarantee that our discovered models would be of any value in the real-world. Another major challenge is that today’s data is either not structured or semi-structured (e.g. in the form of tables). This could be true in both static (e.g. historical databases) and dynamic (e.g. data streams) applications. Deriving meaningful features and structuring our data for an efficient DM process is not trivial.
DMR: How important are domain experts for data mining projects?
FF: Domain experts can play an important role in several stages of a data mining project. This starts with definition of the problem (i.e. explicit definition of the need, what are they looking for) to data understanding, data pre-processing, data analysis, validation of the results, follow up studies and any other feedback that domain experts can provide throughout the entire data mining paradigm. After all, domain experts are the owners of domain knowledge, a key element in any data mining project. Therefore they should be able to assist data miners to associate the elements of domain knowledge to all steps of a data mining application.
DMR: How should they be involved within projects?
FF: This depends on the nature of a data mining project. In some application domains, they need to participate on a regular basis and at all stages of the data mining process. Obviously, domain experts can educate data miners on the significance of meta-data and its contents. For example, to start with domain experts can provide details such as Hierarchical Generalization Trees, Attribute Relationship Rules and Environment Based Constraints on a typical data set. They can provide detail feedback on the scenarios and use-cases that data miners develop in order to analyze a given data. Domain experts can play a key role in developing an evaluation and validation strategy and even recruiting additional players from different departments of an enterprise who would benefit from discoveries made in a data mining project. When the eventual goal of a data mining project is technology transfer and exploitation, domain experts can play the key role of a lead user to demonstrate how a data mining application has been beneficial in solving their problem. They can also assist in building a business case to demonstrate the financial advantage of analyzing certain data. This list could go on.
DMR: What are your advices for new data miners?
FF: My first advice for all data miners is to properly invest the required amount of time on understanding the application domain before getting into a data mining application. They should make every effort to understand what they are looking for (e.g. newly discovered knowledge to be of high value to owners of the data), the contents of the data (e.g. data quality) and what would be the expected results. One has to be aware of all tools, methodologies, techniques that could be relevant to the data mining application that they want to get involved. Make sure to discuss all steps of a data mining project with domain experts and if possible, target a successful deployment to observe the reward of your efforts.