As written in the previous post, Anil K. Jain was the invited speaker of MLDM 2007. He gave an interesting presentation about clustering, focusing on the user’s dilemma. He started with a comprehensive introduction on clustering and then showed some of the future work he is involved in: semi-supervised clustering and clustering with co-association. Below is the abstract of his presentation:
Data clustering is a long standing research problem in pattern recognition, computer vision, machine learning, and data mining with applications in a number of diverse disciplines. The goal is to partition a set of n d-dimensional points into k clusters, where k may or may not be known. Most clustering techniques require the definition of a similarity measure between patterns, which is not easy to specify in the absence of any prior knowledge about cluster shapes. While a large number of clustering algorithms exist, there is no optimal algorithm. Each clustering algorithm imposes a specific structure on the data and has its own approach for estimating the number of clusters. No single algorithm can adequately handle various cluster shapes and structures that are encountered in practice. Instead of spending our effort in devising yet another clustering algorithm, there is a need to build upon the existing published techniques. In this talk we will address the following problems: (i) clustering via evidence accumulation, (ii) simultaneous clustering and dimensionality reduction, (iii) clustering under pair-wise constraints, and (iv) clustering with relevance feedback. Experimental results show that these approaches are promising in identifying arbitrary shaped clusters in multidimensional data.
He made some interesting remarks during his talk. I have noted three of them:
- K-means has been invented in 1955, 1957, 1965 and 1967 (!)
- In a good feature space, any simple clustering algorithm will work
- A clustering method is not the same as a clustering algorithm (an algorithm is an implementation of a particular method)
If interested, you can find more information related to his work.