Loss Functions

Overview


Loss functions are used in the loss minimization process in machine learning.

The function takes a set of data points, {% (\vec{x}_1, y_1), (\vec{x}_2, y_2), ... ,(\vec{x}_n, y_n) %} and a function {% f(\vec{x}) %}, and then computes some sort of error that compares the results of applying the function to each {% x_i %} and comparing the result to {% y_i %}.

Squared Error


The squared error function is a traditional function, primarily because it can be solved analytically in many sitations. (see OLS Regression ) as well as its connection to conditional expectation.
{% \sum_i(y_i - f(\vec{x}_i))^2 %}

Absolute Error


The squared error has nice properties, but can be overly sensitive to outliers. In orer to remedy this, the absulte error is sometimes used instead.
{% \sum_i|y_i - f(\vec{x}_i)| %}

Cross Entropy Loss


The cross entropy loss function is used in classification problems, usually with a neural network which outputs a vector, {% \hat{y}_j %}, with the true output also being a vector, {% y_j %}, which has a {% 1 %} at a single index, and zeros for the others.

The loss for a single data point is
{% loss_i = \sum_j y_j \; log (\frac{e^{\hat{y}_j}}{\sum_k e^{\hat{y}_k}}) %}
with the total loss being the sum over each individual loss.