Deep Learning Regularization

Overview


Regularization is the process of adding error terms to the loss function which are designed to penalize the parameters of a machine learning algorithm for larger parameter values.

Methods


  • Additional Error Terms - additional error term regularization add additional terms to the loss function. Typically these terms are constructing using a measure of the "size" of the algorithms weights. In the description below, we assume that the algorithms weights have been vectorized into a single parameter vector {% \vec{\theta} %}
    • L1 Regularization
      {% loss' = loss + \lambda || \vec{\theta} || %}
    • L2 Regularization
      {% loss' = loss + \lambda || \vec{\theta} ||^2 %}
  • Dropout - the dropout method will randomly remove nodes in a neural network for a given training interation. (the nodes will be put back in the next iteration) It is often thought to be the best type of regularization for neural networks. The reasons for why it works seems to be related to how it tends to simulate a model ensemble.