neural network regularization error terms

Neural Network Regularization Error Terms

Overview

Starting from an initial loss function

{% L_1(\vec{\theta}_1, ... , \vec{\theta}_n) %}

which is a function of the vector of parameters in each layer {% \vec{\theta}_i %}, we can regularize the parameters in layer {% i %} by adding some additional error terms, here labeled {% Reg(\vec{\theta}_i) %}

{% L_2(\vec{\theta}_1, ... , \vec{\theta}_n) = L_1(\vec{\theta}_1, ... , \vec{\theta}_n) + Reg_i(\vec{\theta}_i) %}

Gradient

{% \frac{d L_2}{d \vec{\theta}_i} = \frac{d L_1}{d \vec{\theta}_i} + \frac{d Reg_i}{d \vec{\theta}_i} %}

Implementation

L1 Regularization
L2 Regularization