Mining health records

March 29, 2007 by Sandro Saitta · Leave a Comment
Filed under: application, health 
Yet another battle between companies mining personal data and privacy advocates is ongoing. Microsoft is playing the bad in this story where the personal data consist of health records. According to Government Health IT, Microsoft plans to develop "a clinical data warehouse (CDW) that provides predefined queries of interest to clinicians and analysts". The next step is to apply data mining techniques such as clustering or supervised… Continue reading...
  • Share/Bookmark

Combining PCA and K-means

March 26, 2007 by Sandro Saitta · 6 Comments
Filed under: PCA, k-means 
Although often used in practice, K-means has several drawbacks. The number of clusters has to be defined in advance and the algorithm is dependent upon the starting centroid locations. More details on how to handle these issues can be found on Data Mining Research (search for clustering in the upper bar).A weakness, which is common to clustering in general, concerns the visualization of the obtained clusters. A possible solution… Continue reading... | 6 Comments
  • Share/Bookmark

Data mining for the car industry

March 21, 2007 by Sandro Saitta · 1 Comment
Filed under: application, car, industry 
The Auto Industry website has an article of a few lines summarizing a data mining application for the car industry. More precisely, Ford is using data mining for early warning of supplier failure. After North American suppliers, they plan to cover Europe and Latin America. An example of application is to track late shipment… Continue reading... | 1 Comment
  • Share/Bookmark

Small book review: Java Data Mining

March 19, 2007 by Sandro Saitta · Leave a Comment
Filed under: Java, book, data mining books, review 
Unlike usual books on data mining discussed in this blog, Java Data Mining is a book written for data mining practitioners. Even if the word Java appears in the title, practitioners of other languages or software may be interested by the first part of the book (Strategy), which is really worth reading. The other parts of the book focus on the JDM API itself (Standards), problem solving with… Continue reading...
  • Share/Bookmark

Andrew Moore’s tutorials

March 16, 2007 by Sandro Saitta · 2 Comments
Filed under: tutorial 
When working in data mining, we often have to skip from one technique to another according to the task to perform. Andrew Moore's webpage contains several tutorials about data mining. Most of the standard data mining algorithms are covered by his presentations. It's a good starting point when dealing with a new technique…
  • Share/Bookmark

Recent comments

March 14, 2007 by Sandro Saitta · Leave a Comment
Filed under: comments 
I have noticed that comments are now regularly posted on Data Mining Research. Moreover, people often comment on old posts (i.e. more than 10 days old). People usually find these posts using a search engine. For these two reasons, I have added on the right part of the blog a list of recent posts. I hope it will help readers to know which topics are currently being discussed on… Continue reading...
  • Share/Bookmark

A note on correlation

March 13, 2007 by Sandro Saitta · 6 Comments
Filed under: correlation, variable relationship 
Correlation is often used as a preliminary technique to discover relationships between variables. More precisely, the correlation is a measure of the linear relationship between two variables. Pearson's correlation coefficient is defined as:As written above, the main drawback of correlation is the linear relationship restriction. If the correlation is null between two variables, they may… Continue reading... | 6 Comments
  • Share/Bookmark

Next Page »

  • Data Mining Search Engine

  • Reading Recommandations

  • T-shirts, Mugs & Mousepads

  • Archives

  • Pages

  • Disclaimer

    The opinions discussed on Data Mining Research are my own and do not reflect the position of my current employer, FinScore. The views and opinions expressed by visitors to this blog are theirs and do not necessarily reflect mine.
  • Meta