Deep Learning - Activation Functions
Overview
Activation functions are functions that are applied to the outputs of one layer of a
neural network
before being passed
Sigmoid Type Functions
Hockey Stick Functions
- ReLU - Rectified Linaer Unit
- PReLU -
Parametric Rectified Linear Unit
- ELU
- Exponential Linear Units
Choice of Activation Function
- Single Layer
For networks with either a single layer it is common to use either the sigmoid function or the softmax function
for classification tasks, and the identity for regression tasks.
- Two Layer
It is common to use a tanh function on the inner layer (sometimes the sigmoid), and then a sigmoid or softmax on the outputs for classification
problems. For regression problems, the second layer usually uses the identity function as the activation function.
- Deep Networks
For deep networks with more than 2 layers, the ReLU type functions are more common to use on the inner layers.
This is due to the fact that when layers are stacked, the optimization can run into the problem of
exploding or vanishing gradients, which is solved by using a ReLU. ReLU are often quicker to converge
in deep networks as well.
(see
Trask)