Principal Components

Overview


Principal Components is a dimensionality reduction technique that utilizes the covariance of a dataset. It finds the change of basis of the dataset vector space such that the greatest variance of the data occurs along the first basis vector in the new basis. The second largest along the second and so on.

Algorithm


The method of principal component analysis is a simple Change of Basis tranformation of the dataset, such that the covariance of the dataset stated in terms of the new basis is diagonal.

The Change of Basis transformation can be written as
{% \vec{z}' = A \vec{z} %}
Under this type of transformationg, the covariance matrix of the transformed vectors are given by
{% cov(A \vec{z}) = A cov(\vec{z}) A^T %}
See derivation for the reasoning.

The change of basis is chosen so as to diagonalize the covariance matrix. Once the covariance is diagonalized, the basis vectors in the direction of the smallest variances are discarded. That is, all vector are then projected onto the subspace spanned by the basis vectors with the largest variance.

An implementation of the algorithm can be found at Implementation.

Topics


  • Justification using Least Squares
  • Simulations