Fitting GLM
Overview
Link Function
for each family in the exponential family, there is a link function, {% g %} that relates the condiontional
expectation to a linear combination of the measured parameters.
First define {% \mu_i %}
{% \mu_i = \mathbb{E}(y_i|\vec{x}_i) %}
Next, define {% \eta_i %}
{% \eta_i = \vec{x}_i^T \vec{\beta} %}
{% \frac{\partial \eta_i}{\partial \beta_j} = x_{ij} %}
The link funciton is an invertible that relates {% \eta %} to {% \mu %}.
{% \eta_i = g(\mu_i) %}
Or
{% g^{-1}(\eta_i) = \mu_i %}
Relationships
The following relationships hold in the Hardin formulation
{% (3.4) \;\;\;\; \mathbb{E}(\partial \mathcal{L}/\partial \theta) = 0 %}
{% (3.5) \;\;\;\; \mathbb{E}(\partial ^2 \mathcal{L}/\partial \theta ^2 + (\partial \mathcal{L}/\partial \theta)^2) = 0 %}
{% (3.6) \;\;\;\; \frac{\mathbb{E}(y_i) - b'(\theta_i)}{a(\phi)} = 0 %}
{% b'(\theta_i) = \mathbb{E}(y_i) = \mu_i %}
Variance
{% 0 = - \frac{b''(\theta_i)'}{a(\phi)} + \frac{1}{a(\phi)^2} \mathbb{E}[y_i - b'(\theta_i)]^2 %}
{% = - \frac{b''(\theta_i)'}{a(\phi)} + \frac{1}{a(\phi)^2} \mathbb{E}[y_i - u_i]^2 %}
{% = - \frac{b''(\theta_i)'}{a(\phi)} + \frac{1}{a(\phi)^2} Var(y_i) %}
{% Var(y) = b''(\theta)a(\phi) %}
then define
{% v(\mu) = b''(\theta) %}
{% v(\mu) = \frac{\partial \mu}{\partial \theta} %}
Newtons Algorthim
The standard method to fit a GLM is to use
Newtons Algorthim
to find the maximum log-likelihood.
The process proceeds iteratively. That is, it assumes that
{% 0 \approx \mathcal{L}'(\beta^n) + (\beta^{n+1} - \beta^n)\mathcal{L}''(\beta^n) %}
which yields the formula for the update in each iteration.
{% \beta^{n+1} = \beta^n - [\mathcal{L}''(\beta^n)]^{-1} \mathcal{L}'(\beta^n) %}
the Algorthim requires the use of the gradient of the log-likelihood and the hessian. (calculated below)
Log Likelihood Gradient
{% \frac{\partial \mathcal{L}}{\partial \beta_j} = \sum_{i=1}^n (\frac{\partial \mathcal{L}_i}{\partial \theta_i})
(\frac{\partial \theta_i}{\partial \mu_i})(\frac{\partial \mu_i}{\partial \eta_i})
(\frac{\partial \eta_i}{\partial \beta_j}) %}
{% = \sum_{i=1}^n (\frac{y_i - b'(\theta_i)}{a(\phi)})(\frac{1}{v(\mu_i)})(\frac{\partial \mu}{\partial \eta})_i x_{ij} %}
{% = \sum_{i=1}^n \frac{y_i - \mu_i}{a(\phi)v(\mu_i)} (\frac{\partial \mu}{\partial \eta})_i x_{ji} %}
Log Likelihood Hessian
{% \frac{\partial^2 \mathcal{L}}{\partial \beta_j \partial \beta_k} =
\sum_{i=1}^n \frac{1}{a(\phi)} (\frac{\partial}{\partial \beta_k}) [\frac{y_i-\mu_i}{v(\mu_i)}(\frac{\partial \mu}{\partial \eta})_i x_{ji}]
%}
{% = \sum_{i=1}^n \frac{1}{a(\phi)}[(\frac{\partial \mu}{\partial \eta})_i \{ (\frac{\partial}{\partial \mu})_i (\frac{\partial \mu}{\partial \eta})_i (\frac{\partial \eta}{\partial \beta_k})_i \}
\frac{y_i - \mu_i}{v(\mu_i)} + \frac{y_i - \mu_i}{v(\mu_i)}\{ (\frac{\partial}{\partial \eta})_i (\frac{\partial \eta}{\partial \beta_k})_i \} (\frac{\partial \mu}{\partial \eta})_i
]x_{ij} %}
{% = - \sum_{i=1}^n \frac{1}{a(\phi)}[ \frac{1}{v(\mu_i)} (\frac{\partial \mu}{\partial \eta})_i^2 - (\mu_i - y_i)
\{ \frac{1}{v(\mu_i)^2} (\frac{\partial \mu}{\partial \eta})_i^2 \frac{\partial v(\mu_i)}{\partial \mu}
- \frac{1}{v(\mu_i)} (\frac{\partial ^2 \mu}{\partial \eta ^2})_i \}
] x_{ji} x_{ki} %}
Examples