Ridge Regression and Regularization
Overview
Ridge regression is a form of
regularization
applied to
OLS Regression
Ridge Regression
Ridge regression takes the normal
loss function
used in OLS regression and adds an additional term that is proportional to the
norm
of the weight vector.
{% L(\vec{w}) = \frac{1}{N} \sum_{i=1}^N (y_i - (w_0 + \vec{w}^T \vec{x}))^2 + \lambda || \vec{w} || ^2 %}
Here {% \lambda || \vec{w} || ^2 %} is the additional term. This term penalizes the regression for having
large weights, thereby pushing the weights toward zero. The term inlcudes the hyper-parameter {% \lambda %}
which dictates how strongly the term affects the regression.
The optimal weights of the ridge regression are given by
{% \vec{w}_{opt} = (\lambda I_D + X^TX)^{-1} X^T \vec{y} %}
Choosing Lambda
In general, there are no a-priori reasons to choose one value of lambda over another. Typically,
lambda is labeled as a hyper-parameter and trained by using a validation set.
See
Data partitioning.
Connection to Bayesianism
Bayesianism
{% \mathbb{P}(\vec{w}) = \Pi_{i} \mathbb{N}(w_i|0,\tau^2) %}