Clustering

Overview

Clustering in the process of trying to identify natural clumps, or clusters that occur in a data. It is particularly useful when trying to categorize data, but the categories are unknown.

Distance Measures

One of the key concepts in clustering is the idea of a distance between two points in the datasset. The concept of distance is formalized in the notion of a metric in topology.

Once a metric has been defined, the following can be computed.

Algorithms

The following are common clustering algorithms.