Clustering

Overview


Clustering in the process of trying to identify natural clumps, or clusters that occur in a data. It is particularly useful when trying to categorize data, but the categories are unknown.

Distance Measures


One of the key concepts in clustering is the idea of a distance between two points in the datasset. The concept of distance is formalized in the notion of a metric in topology.

Once a metric has been defined, the following can be computed.

  • Distance Between Points - given by the metric.
  • Distance between a Point and a Cluster
  • Distance Between Clusters

Algorithms


The following are common clustering algorithms.

  • Agglomerative
  • K-Means