Neural Network Regularization Error Terms

Overview


Starting from an initial loss function
{% L_1(\vec{\theta}_1, ... , \vec{\theta}_n) %}
which is a function of the vector of parameters in each layer {% \vec{\theta}_i %}, we can regularize the parameters in layer {% i %} by adding some additional error terms, here labeled {% Reg(\vec{\theta}_i) %}
{% L_2(\vec{\theta}_1, ... , \vec{\theta}_n) = L_1(\vec{\theta}_1, ... , \vec{\theta}_n) + Reg_i(\vec{\theta}_i) %}

Gradient


{% \frac{d L_2}{d \vec{\theta}_i} = \frac{d L_1}{d \vec{\theta}_i} + \frac{d Reg_i}{d \vec{\theta}_i} %}

Implementation


  • L1 Regularization
  • L2 Regularization