Cluster validity: Clustering algorithms
Now that the clustering ideas have been introduced, let’s look at existing clustering strategies. Several clustering techniques can be found in the literature. They can be divided in four main categories (1): partitional clustering (Kmeans, etc.), hierarchical clustering (BIRCH, etc.), densitybased clustering (DBSCAN, etc.) and gridbased clustering (STING, etc.). In the literature, clustering can be found under different expression such as unsupervised learning, numerical taxonomy and partition (2).
One of the most common technique for clustering is Kmeans (3). Main reasons can be found among other categories drawbacks (even if kmeans has its own drawbacks). Hierarchical clustering, for example, usually has a higher complexity such as O(n^2). Densitybased clustering algorithms often have nonintuitive parameters to tune. Finally, gridbased clustering algorithms not always give clusters of good quality (1).
Main advantages of Kmeans are its computational efficiency and its simplicity to understand the results. Bolshakova and Azuaje (4) thinks that Kmeans is the most widely used clustering algorithm in practice. This last point is a good indicator of its efficiency in reallife situations. The main drawbacks of Kmeans are certainly the random centroid locations and unknown number of clusters K. This number has to be known in advance and is an input in the standard Kmeans algorithm. That’s where cluster validity enters in the game. And this is for the next post.
(1) M. Halkidi, Y. Batistakis, and M. Vazirgiannis. On clustering validation techniques. J. of Intelligent Information Systems, 17(23):107145, 2001.
(2) S. Theodoridis and K. Koutroumbas. Pattern Recognition. Academic Press, 1999.
(3) A. Jain and R. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.
(4) N. Bolshakova and F. Azuaje. Cluster validation techniques for genome expression data. Signal Process., 83(4):825833, 2003.
Comments
2 Comments on Cluster validity: Clustering algorithms

Nimit on
Wed, 22nd Nov 2006 7:06 pm

Sandro Saitta on
Thu, 23rd Nov 2006 9:58 am
you may like to add this reference:
James C. Bezdek and Nikhil R. Pal. Some New Indexes of Cluster Validity. SMCB, 28(3):301–315, 1998.
Thanks! This reference is an important contribution to cluster validity. I have many other references if needed.
Tell me what you're thinking...