Overview
Clustering in the process of trying to identify natural clumps, or clusters that occur in a data. It is particularly useful when trying to categorize data, but the categories are unknown.
Distance Measures
One of the key concepts in clustering is the idea of a distance between two points in the datasset. The concept of distance is formalized in the notion of a metric in topology.
Once a metric has been defined, the following can be computed.
- Distance Between Points - given by the metric.
- Distance between a Point and a Cluster
- Distance Between Clusters
Algorithms
The following are common clustering algorithms.