Information Criteria
Overview
Information criteria is a framework for creating criteria for model selection.
It assumes that the model is attempting to create an estimate for some unknown statistical distribution g(x) that has
generated some dataset. The model returns an estimated distribution, f(x) after being fit to the data.
Information criteria then applies the concept of a
distance
between distributions to measure how well the estimated
distribution fits the unknown distrbution. That is, the fit is good if the distance is small.
The actual distance between the distributions cant be calculated because one of teh distributions is unknown,
however, the process creates an expectation for the distance given the data set, and this expectation is used
as the measure of model fit.
Standard Algorithm
The standard procedure for defining an information criteria uses the
Kullback-Liebler
measure of distance.
{% d(g,f) = \mathbb{E}_g [log \frac{g(x)}{f(x)}] = \mathbb{E}_g [log \, g(x)] - \mathbb{E}_g [log f(x)] %}
{% \mathbb{E}_g[log f(x)] = \int g(x) \, log \, f(x) dx %}
{% \mathbb{E}_g [log f(x)] = \frac{1}{n} \sum_i log f(x_i) %}