Overview
Once an average default rate has been computed, we would like to build some sort of confidence interval around it. In particular, we would like some sort of upper bound to the average default rate, beyond which we can assume that it is extremely unlikely that the true default rate does not exceed the computed bound.
The calculation of the variability of the estimate is demonstrated using two techniques below. In order to do so, we will utilize resampling techniques, in particular, we will use the bootstrap to compute a distribution of the likely mean default rates.
Variance of Estimate
A simple way to compute a bound on the estimate of the average default rate would be to use the statisitical inference techniques of Estimation to compute an estimated standard deviation of the calculated average. In particular
{% Var(\hat{\mu}) = \frac{1}{n^2} \sum_{i=1}^n Var(X_i) %}
The following code utilizes the
moments library
to calculate a variance of the default field.
(see
moments
for additional information about calculating moments)
Sampled Default Rate
The process of bootstrapping an estimate involves resampling from the original dataset, then computing the desired statistic, and then keeping track of the distribution of the statistic.
The following code uses the sampling library to create 100 samples from the original dataset, and to compute the average default rate in each sample.
Next Step
We may want to try to tie the default rate to any of a number of measured factors. The next step runs a logistic
regression to try to measure the influence of the various factors.