Overview
Starting from an initial loss function
{% L_1(\vec{\theta}_1, ... , \vec{\theta}_n) %}
which is a function of the vector of parameters in each layer {% \vec{\theta}_i %},
we can regularize the parameters in layer {% i %} by adding some additional error terms,
here labeled {% Reg(\vec{\theta}_i) %}
{% L_2(\vec{\theta}_1, ... , \vec{\theta}_n) =
L_1(\vec{\theta}_1, ... , \vec{\theta}_n) +
Reg_i(\vec{\theta}_i)
%}
Gradient
{% \frac{d L_2}{d \vec{\theta}_i} = \frac{d L_1}{d \vec{\theta}_i} + \frac{d Reg_i}{d \vec{\theta}_i} %}