Data mining people: Heikki Mannila
Data mining explained
On the blog of Devipriya, there is a very interesting and complete introduction to data mining named "Who is mining your data?". This clearly written introduction is mainly intended to people who wants to know what motivates data mining and what are the possible applications. Only the minimum technical terms are used so that any reader can understand what data mining is about…Juice Analytics’ Blog
It's always a pleasure for me to find interesting blogs about data mining and to present them here. Juice Analytics is company that... well, let's them define what they do with their own words: "Juice Analytics helps small and mid-market companies develop deep prospect and customer understanding through visualization and analytics of existing
Now boarding!
Here is some food for the week-end:- Will is explaining a good alternative to the standard Euclidean distance by introducing the Mahalanobis distance on his blog
- Andy is writing about the fact that Google seems to start integrating blog post in its results (pointed by Matthew)
Cluster validity: Existing indices
The third - and final - post on cluster validity is about existing validity indices. As written in (1), the two fundamentals issues in cluster validity are 1) the number of clusters present in the data and 2) how good is the clustering itself.Several indices have been proposed in the literature. The main idea with these indices is to plot them with regard to the number of clusters and then… Continue reading... | 9 CommentsCluster validity: Clustering algorithms
Now that the clustering ideas have been introduced, let's look at existing clustering strategies. Several clustering techniques can be found in the literature. They can be divided in four main categories (1): partitional clustering (K-means, etc.), hierarchical clustering (BIRCH, etc.), density-based clustering (DBSCAN, etc.) and grid-based clustering (STING, etc.). In the literature, clustering can be found under different expression such as unsupervised learning, numerical taxonomy and partition (2). One… Continue reading... | 2 CommentsCluster validity: Introduction to clustering
November 21, 2006 by Sandro Saitta · 2 Comments
Filed under: clustering, unsupervised learning, validity index
In the near future, I will use this blog to write about recent research I'm involved in. I start today (and the following days) by an introduction on the topic I'm interested in: cluster validity.Clustering is certainly the best known example of unsupervised learning. The goal of clustering is to group data points that are similar according to a given similarity metric (by default Euclidean distance is used). As Jain… Continue reading... | 2 Comments
Filed under: clustering, unsupervised learning, validity index














