Hyperparameters

Overview


Many machine learning models have two sets of parameters that need to be specified. One set will be tuned by the training algorithm. However, the other set needs to be specified, prior to even being able to train the algorithm. That is, these parameters arent tuned by training.

For neural networks, for example, you will have to select

  • Number of Layers
  • Number of Neurons per Layers
  • Learning Rate (when applicable)
  • Number of Iterations, Epochs

These parameters are called the hyper-parameters. (this is in contrast to the neurons weights which are referred to as parameters)

Tuning


There is no exact way to know what the best values of these hyper-parameters ahead of time. There are some rules of thumb, but ultimately one may need to optimize over the set of hyper-parameters as well as optimizing the model for each set.

The most common way to do this is to create a validation data set in addition to the training set and the test set. The validation set is used to train the hyperparameters. This is done as follows.

  • Choose a set of hyperparameters
  • Train the model with the chosen hyperparameters
  • Measure the total error on the validation set
  • Repeat this process with a new set of hyperparameters

The final set of hyperparameters is chosen as the set that had the smallest error on the validation set.