Overview
Guassian models are models where the distributions underlying the model are all Guassian, or normal. This is usually a simplification, but is often good enough and makes the models tractable and computationally feasible. Guassian models are often a good starting point.
Multivariate Normal Distributions
Multivariate Normal Distirubtions is a distribution over a vector of data, where each marginal distribution is normal.
The formula for the multivariate normal distribution, using matrix notation is
{% N(x|\mu, \Sigma) = 1/ (2 \pi ^{D/2} |\Sigma|^{1/2} ) \times exp [-0.5 \times (x-\mu)^T \Sigma^{-1}(c - \mu)] ) %}
The expression
{% [-0.5 \times (x-\mu)^T \Sigma^{-1}(c - \mu)] %}
is called the Mahalanobis distance
between the data vector x and the mean vector {% \mu %}
(see Machine Learning Distance)
Gaussian Discriminant Analysis
The Gaussian Discriminant Analysis assumes that the underlying distribution that is generating the data is a series of multivariate normal distributions. This means that the distribution generates various classes (or categories) of data, and each category is generated by a Normal distribution. The trick then becomes determining which category a sample point belongs to.
The usual way to accomplish this when the underlying distributions are normal is to use the maximum likelihood to determine the category with the highest probability.
The log likelihood of the probability given by a Bayes classifier takes the following form when the conditional probabilities are normal
{% p(x|y=c,\theta) = N(x|\mu_c , \Sigma_c) %}
{% log \; p(y=c|\vec{x};\vec{\theta}) = log \; \pi_c - \frac{1}{2} log |2 \pi \Sigma_c| - \frac{1}{2}(\vec{x}-\vec{\mu_c})^T \Sigma_c^{-1}(\vec{x}-\vec{\mu_c}) + const %}
where
{% \pi_c = p(y=c;\vec{\theta}) = \frac{N_c}{N} %}
Note the constant in the expression which can be ignored when only trying to find the most probable category.
Which leads to the nearest centroids classifier,
{% y(x) = argmin _c (x-\mu_c)^T \Sigma^{-1} (x-\mu_c) %}
Implementation
The following basic Implementation uses the moments library and the norms library.
let mt = await import('/lib/statistics/moments/v1.0.0/moments.js');
let nm = await import('/lib/linear-algebra/v1.0.0/norms.mjs');
let mu1 = mt.mean(data1);
let mu2 = mt.mean(data2);
let covar1 = mt.covariance(data1);
let covar2 = mt.covariance(data2);
let testPoint = [160,80]
let distance1 = nm.mahalanobis(testPoint,mu1,covar1);
let distance2 = nm.mahalanobis(testPoint,mu2,covar2);
Example Distributions
As an example of guassian discriminant analysis, consider the height and weight data for men versus women. The data displays the characteristic
shape of guassian processes.
copy
copy