Overview
Regularization is the process of adding error terms to the loss function which are designed to penalize the parameters of a machine learning algorithm for larger parameter values.
Methods
- Additional Error Terms
- additional error term regularization add additional terms to the
loss function. Typically these terms are constructing using a measure of the
"size" of the algorithms weights. In the description below, we assume that the algorithms
weights have been
vectorized
into a single parameter vector {% \vec{\theta} %}
- L1 Regularization
{% loss' = loss + \lambda || \vec{\theta} || %}
- L2 Regularization
{% loss' = loss + \lambda || \vec{\theta} ||^2 %}
- L1 Regularization
- Dropout - the dropout method will randomly remove nodes in a neural network for a given training interation. (the nodes will be put back in the next iteration) It is often thought to be the best type of regularization for neural networks. The reasons for why it works seems to be related to how it tends to simulate a model ensemble.