Overview
Loss functions are used in the loss minimization process in machine learning.
The function takes a set of data points, {% (\vec{x}_1, y_1), (\vec{x}_2, y_2), ... ,(\vec{x}_n, y_n) %} and a function {% f(\vec{x}) %}, and then computes some sort of error that compares the results of applying the function to each {% x_i %} and comparing the result to {% y_i %}.
Squared Error
The squared error function is a traditional function, primarily because it can be solved analytically in many sitations. (see OLS Regression ) as well as its connection to conditional expectation.
{% \sum_i(y_i - f(\vec{x}_i))^2 %}
Absolute Error
The squared error has nice properties, but can be overly sensitive to outliers. In orer to remedy this, the absulte error is sometimes used instead.
{% \sum_i|y_i - f(\vec{x}_i)| %}
Cross Entropy Loss
The cross entropy loss function is used in classification problems, usually with a neural network which outputs a vector, {% \hat{y}_j %}, with the true output also being a vector, {% y_j %}, which has a {% 1 %} at a single index, and zeros for the others.
The loss for a single data point is
{% loss_i = \sum_j y_j \; log (\frac{e^{\hat{y}_j}}{\sum_k e^{\hat{y}_k}}) %}
with the total loss being the sum over each individual loss.