Ridge Regression
Ridge regression takes the normal loss function used in OLS regression and adds an additional term that is proportional to the norm of the weight vector.
{% L(\vec{w}) = \frac{1}{N} \sum_{i=1}^N (y_i - (w_0 + \vec{w}^T \vec{x}))^2 + \lambda || \vec{w} || ^2 %}
Here {% \lambda || \vec{w} || ^2 %} is the additional term. This term penalizes the regression for having
large weights, thereby pushing the weights toward zero. The term inlcudes the hyper-parameter {% \lambda %}
which dictates how strongly the term affects the regression.
The optimal weights of the ridge regression are given by
{% \vec{w}_{opt} = (\lambda I_D + X^TX)^{-1} X^T \vec{y} %}