Model Evaluation

Overview


Model evaluation refers to the process of making an estimate of the effectiveness of a trained machine learning algorithm. In standard statistics, evaluation is often accomplished by assuming a statistical process is generated by an assumed distribution and then a hypothesis test is performed in order to assess the goodness of fit of the fitted parameters.

In machine learning, this is often not done for a couple of reasons:

  • A model is constructed that does not have an analytical distribution
  • The modeler does not wish to make any assumptions about the underlying distribution

Unfortunately for the analyst, the error of the model on the training dataset is not a good indicator of what the out-of-sample performance will be. This is typically true as the complexity of the model relative to the number of training points gets large. For training sets with a large number of records relative to the complexity of the model, there is theory to support the hypothesis that the training error is close to the out of sample error. However, in cases with large training sets, it is easy to create both a large test set (see below) while still having lots of training data left.

Methods


  • Test Set - create a test set to get an estimate of the error
  • Cross Validation - similar to creating a test set, but used when the training set is not very large. Requires more compute time.
  • Uncertainty

Performance Measures


Performance measures are functions that take a set of computed errors and returns a single number representing the total error in some sense. While these can be essentially be thought of as loss functions, often times that are constructed separate from the the loss function used to train the machine learning algorithm

  • Categorical Measures - measures of error used for alogrithms that categorize data.
  • Regression Measures - measures used to compute the error of an algorithm used in data regression.