Clustering Evaluation

A typical way to evaluate the effectiveness of a set of clusters is to compute an error. A simple example is the mean squared error, which is defined as the square of the distance between each datapoint and the centroid of the cluster that it belongs to.
{% error = \sum|\vec{x} - \vec{centroid}|^2 %}

Visualizing the Error and the Elbow Method

As the number of clusters goes up, the error will go down. However, the difference in the error for each additional cluster also typically goes down. One way to asses the optimal number of clusters is to graph the total error for each number of clusters and assess where the error curve bends the most. (usually referred to as the elbow of the curve)