Weight Initialization

Overview


When a neural network is created, the parameters of each layer need to be assigned an initial set of values. At a basic level, it may be thought that the initial values do not affect the outcomes, mostly because the optimization routine will move the parameters off their initial values towards the optimum. Using this reasoning, one could assign all values to 0. However, this creates a symmetry in the weights which can lead to problems during optimization.

It is generally considered best to assign weights an intial random value. If the analyst has some idea of where the optimal weights lie in the weight space, she may choose to initialize the weights randomly within that region. However, for most problems of sufficient complexity, this is a difficult task, and a random sample from a simple region, such as the unit interval {% [0,1] %}, suffices.

Alternative methods have also been suggested. (see below)

Methods


The following are common methods of weight initialization.

  • Xavier
  • He Initialization