Overview
Estimation is the process of determining a point estimate of a distribution parameter. Typically the parameters of interest are moments of the distribution, but any number that helps to specify the shape of the distribution in question can be an estimated paramter.
Example - Distribution Mean
As a simple example, an analyst may wish to determine the mean, {% \mu %} of a distribution. She will follow a process or algorithm to compute a number, {% \hat{\mu} %} which is in some sense close to the theoretical parameter. (In this case, the typical procedure is just to compute the average of the samples.)
{% \hat{\mu} = \frac{1}{n} \sum_i X_i %}
The variance of the average can be computed as
{% Var(\hat{\mu}) = \frac{1}{n^2} \sum_{i=1}^n Var(X_i) %}
The analyst will then want to understand how likely the algorithm that she followed is to have produced a good estimate,
and maybe if there are procedures that can compoute a better estimate.
Estimator Bias
Bias refers to the situation where the
expected value
of an estimated parameter
is not equal to the parameter. When the expectation of of the parameter equals the parameter (in theory),
the estimate is said to be unbiased.
In the example above, the estimate is theoretically unbiased.
In the example above, the estimate is theoretically unbiased.
{% \mathbb{E}[\hat{\mu}] = \frac{1}{n} \sum_i \mathbb{E} [X_i] = \mathbb{E}[X_i] %}
Example - Distribution Variance
When estimating the variance of distribution, one may be tempted to calculate the following estimate.
{% \hat{\sigma^2} = \frac{1}{n} \sum_i (X_i - \hat{\mu})^2 %}
When using the estimate of mean given above, this formula produces a biased estimate of the variance.
The correct formula to get an unbiased estimate is given by:
{% \hat{\sigma^2} = \frac{1}{n-1} \sum_i (X_i - \hat{\mu})^2 %}
Bias Variance Tradeoff
{% \mathbb{E}[(\hat{\theta} - \theta)^2] = Variance(\hat{\theta}) + Bias(\hat{\theta})^2 %}
{% \mathbb{E}[(\hat{\theta} - \theta)^2] %}
{% = \mathbb{E}[(\hat{\theta} - \mathbb{E}[\hat{\theta}] + \mathbb{E}[\hat{\theta}] - \theta)^2] %}
{% = \mathbb{E} [(\hat{\theta} - \mathbb{E}[\hat{\theta}])^2 + (\mathbb{E}[\hat{\theta}] - \theta)^2 +2(\hat{\theta} - \mathbb{E}[\hat{\theta}])(\mathbb{E}[\hat{\theta}] - \theta) ] %}
{% = \mathbb{E}[(\hat{\theta} - \mathbb{E}[\hat{\theta}])^2] + \mathbb{E}[(\mathbb{E}[\hat{\theta}]-\theta)^2] + \mathbb{E}[2(\hat{\theta} - \mathbb{E}[\hat{\theta}])(\mathbb{E}[\hat{\theta}-\theta])] %}
The first term is the variance {% Variance(\hat{\theta}) = \mathbb{E}[(\hat{\theta} - \mathbb{E}[\hat{\theta}])^2] %}
The second term is the bias squared {% Bias(\hat{\theta})^2 = \mathbb{E}[(\mathbb{E}[\hat{\theta}]-\theta)^2] = (\mathbb{E}[\hat{\theta}]-\theta)^2 %}
The third term is equal to zero
{% = \mathbb{E}[(\hat{\theta} - \mathbb{E}[\hat{\theta}] + \mathbb{E}[\hat{\theta}] - \theta)^2] %}
{% = \mathbb{E} [(\hat{\theta} - \mathbb{E}[\hat{\theta}])^2 + (\mathbb{E}[\hat{\theta}] - \theta)^2 +2(\hat{\theta} - \mathbb{E}[\hat{\theta}])(\mathbb{E}[\hat{\theta}] - \theta) ] %}
{% = \mathbb{E}[(\hat{\theta} - \mathbb{E}[\hat{\theta}])^2] + \mathbb{E}[(\mathbb{E}[\hat{\theta}]-\theta)^2] + \mathbb{E}[2(\hat{\theta} - \mathbb{E}[\hat{\theta}])(\mathbb{E}[\hat{\theta}-\theta])] %}
The first term is the variance {% Variance(\hat{\theta}) = \mathbb{E}[(\hat{\theta} - \mathbb{E}[\hat{\theta}])^2] %}
The second term is the bias squared {% Bias(\hat{\theta})^2 = \mathbb{E}[(\mathbb{E}[\hat{\theta}]-\theta)^2] = (\mathbb{E}[\hat{\theta}]-\theta)^2 %}
The third term is equal to zero
Consistency
An estimate is said to be consistent if it converges to the true value
{% \hat{\theta}_n \rightarrow \theta %}
where the convergence is taken to be
convergence in probability
Fitting to Data
The most common way to obtain an estimate from a dataset is to use the technique of Maximum Likelihood