Estimation

Overview


Estimation is the process of determining a point estimate of a distribution parameter. Typically the parameters of interest are moments of the distribution, but any number that helps to specify the shape of the distribution in question can be an estimated paramter.

Example - Distribution Mean


As a simple example, an analyst may wish to determine the mean, {% \mu %} of a distribution. She will follow a process or algorithm to compute a number, {% \hat{\mu} %} which is in some sense close to the theoretical parameter. (In this case, the typical procedure is just to compute the average of the samples.)
{% \hat{\mu} = \frac{1}{n} \sum_i X_i %}
The variance of the average can be computed as
{% Var(\hat{\mu}) = \frac{1}{n^2} \sum_{i=1}^n Var(X_i) %}
The analyst will then want to understand how likely the algorithm that she followed is to have produced a good estimate, and maybe if there are procedures that can compoute a better estimate.

Estimator Bias


Bias refers to the situation where the expected value of an estimated parameter is not equal to the parameter. When the expectation of of the parameter equals the parameter (in theory), the estimate is said to be unbiased.

In the example above, the estimate is theoretically unbiased.
{% \mathbb{E}[\hat{\mu}] = \frac{1}{n} \sum_i \mathbb{E} [X_i] = \mathbb{E}[X_i] %}

Example - Distribution Variance


When estimating the variance of distribution, one may be tempted to calculate the following estimate.
{% \hat{\sigma^2} = \frac{1}{n} \sum_i (X_i - \hat{\mu})^2 %}
When using the estimate of mean given above, this formula produces a biased estimate of the variance. The correct formula to get an unbiased estimate is given by:
{% \hat{\sigma^2} = \frac{1}{n-1} \sum_i (X_i - \hat{\mu})^2 %}

Bias Variance Tradeoff


{% \mathbb{E}[(\hat{\theta} - \theta)^2] = Variance(\hat{\theta}) + Bias(\hat{\theta})^2 %}
{% \mathbb{E}[(\hat{\theta} - \theta)^2] %}
{% = \mathbb{E}[(\hat{\theta} - \mathbb{E}[\hat{\theta}] + \mathbb{E}[\hat{\theta}] - \theta)^2] %}
{% = \mathbb{E} [(\hat{\theta} - \mathbb{E}[\hat{\theta}])^2 + (\mathbb{E}[\hat{\theta}] - \theta)^2 +2(\hat{\theta} - \mathbb{E}[\hat{\theta}])(\mathbb{E}[\hat{\theta}] - \theta) ] %}
{% = \mathbb{E}[(\hat{\theta} - \mathbb{E}[\hat{\theta}])^2] + \mathbb{E}[(\mathbb{E}[\hat{\theta}]-\theta)^2] + \mathbb{E}[2(\hat{\theta} - \mathbb{E}[\hat{\theta}])(\mathbb{E}[\hat{\theta}-\theta])] %}

The first term is the variance
{% Variance(\hat{\theta}) = \mathbb{E}[(\hat{\theta} - \mathbb{E}[\hat{\theta}])^2] %}

The second term is the bias squared
{% Bias(\hat{\theta})^2 = \mathbb{E}[(\mathbb{E}[\hat{\theta}]-\theta)^2] = (\mathbb{E}[\hat{\theta}]-\theta)^2 %}

The third term is equal to zero

Consistency


An estimate is said to be consistent if it converges to the true value
{% \hat{\theta}_n \rightarrow \theta %}
where the convergence is taken to be convergence in probability

Fitting to Data


The most common way to obtain an estimate from a dataset is to use the technique of Maximum Likelihood