Ridge Regression and Regularization

Overview


Ridge regression is a form of regularization applied to OLS Regression

Ridge Regression


Ridge regression takes the normal loss function used in OLS regression and adds an additional term that is proportional to the norm of the weight vector.
{% L(\vec{w}) = \frac{1}{N} \sum_{i=1}^N (y_i - (w_0 + \vec{w}^T \vec{x}))^2 + \lambda || \vec{w} || ^2 %}
Here {% \lambda || \vec{w} || ^2 %} is the additional term. This term penalizes the regression for having large weights, thereby pushing the weights toward zero. The term inlcudes the hyper-parameter {% \lambda %} which dictates how strongly the term affects the regression.

The optimal weights of the ridge regression are given by
{% \vec{w}_{opt} = (\lambda I_D + X^TX)^{-1} X^T \vec{y} %}

Choosing Lambda


In general, there are no a-priori reasons to choose one value of lambda over another. Typically, lambda is labeled as a hyper-parameter and trained by using a validation set. See Data partitioning.

Connection to Bayesianism


Bayesianism
{% \mathbb{P}(\vec{w}) = \Pi_{i} \mathbb{N}(w_i|0,\tau^2) %}

Contents