Simple Recurrent Neural Networks

Overview


Network Architecture


A simple recurrent neural network can be described as
{% \vec{h_t} = \sigma(U \vec{h}_{t-1} + W \vec{s}_t + \vec{b}) %}
where {% \vec{s}_i %} is the input vector at time i and {% \vec{h}_i %} is the output vector which is also fed back to the network in the next step.

Here, U and W represent two different weight matrices.
see Salem

Training


Training of a simple RNN uses the Backpropagation Through Time algorithm, which is a variant of the Backpropagation algorithm. It does this by "unrolling" the network and applying the Backpropagation. That is, a simple one-layer network will appear to be n-layer network on the {% n^{th} %} input.

Because of the way that a recurrent neural network is constructed, each additional sequential input adds an additional layer of computation to the backpropagation, which will increase computation time as well as creating the exploding/diminishing gradient problem mentioned below.

Vanishing/Exploding Gradients


The Chain Rule when applied to a complex neural network will express the desired gradient as the multiplication of several factors.
{% grad_1 \times grad_2 \times ... \times grad_n %}
A recurrent network will have gradients computed with a factor for each prior time period. (that is, the gradient for the nth input will have at least n terms).

In such a case, as n gets large, if the terms are less than 1, the computed gradient will likely converge toward 0. If the factors are greater than 1, then the computed gradient can explode.

Contents