Overview
K-Means clustering is a clustering algorithm for finding a set of k clusters in a dataset of n points. Each cluster is designated by a centroid, that is, a point which represents the average of the points in the cluster.
{% \vec{\mu}_i = \frac{1}{n_i} \sum \vec{x}_j %}
The algorithm attempts to find k centroids such that the squared error is minimized.
{% error = \sum_{i=1}^k \sum _j || \vec{x}_j - \vec{\mu}_i ||^2 %}
The algorithm is not guaranteed to find the global minimum of the error, but it will converge to a local minimum. This means that it may
make sense to run the algorithm multiple times and the choose the best result.
Algorithm
The algorithm begins by randomly generating k points, taken to be an initial guess of the k centroids.Next it iterates the following steps.
- Assign every datapoint to the centroid that it is closest to
- Update the centroids - after the points have been assigned a centroid, each centroid is computed to be the mean of the points that are assigned to it.