Multinomial Regression

Overview


Logistic Regression is a statistical tool that is typically used to model the probability of a binary outcome. When the outcome is not binary, but still finite and discrete, (that is, the possible outcomes form a finite set of possible labels), the logistic regression can be extended to accomodate multiple outcomes. When this is done, it is often referred to as a multinomial regression or softmax regression.

One Hot Encoding


One hot encoding is a classifier that is designed to return a column vector which consists of zeros in each place, except a one in the index that represents the correct category.

The following column vector represents one hot encoding in a class with 4 category labels.
{% \begin{bmatrix} 0 \\ 0 \\ 1 \\ 0 \\ \end{bmatrix} %}
A logistic regression is run for each category, where the outcome equals 1 for data points in the category and 0 for everything else. For a sample point, the regression is used to return a number between 0 and 1 for each category label. These numbers are then arranged as a column vector as above, and finally, they are normalized so that they sum to one.

In general, a given data point is forecast to belong to the category with the highest value in the output vector.

Loss Function and Fitting


The activation function used in the multinomial regression is the softmax function
{% S(\vec{z}) = \frac{1}{\sum e^{z_i}} \begin{bmatrix} e^{z_1} \\ ... \\ e^{z_n} \\ \end{bmatrix} %}
That is, the predicted output of a given input vector {% \vec{x} %} is given by
{% \hat{y} = S(\vec{b} + W \vec{x}) %}