Fitting Logistic Regression

Overview


Fitting using Maximum Likelihood


The standard method of fitting a logistic regression is through Maximum Likelihood
{% LogLikelihood(\vec{w}) = \sum_{i}^n [y_i log \mu_i + (1-y_i)log(1-\mu_i)] %}
(see Murphy chap 8)


There is no analytic solution finding the maximum of this function. The standard method to find the maximum is to use Newtons Method, although Gradient Descent is sometimes used as well.

Machine Learning Optimization


The maximum likelihood method solves the logistic regression by finding the maximum of the log likelihood function. In machine learning, the learning algorithm minimizes the loss function.

The logistic regression can be recast as minimizing the loss function given by the negative of the log-likelihood function. This loss function is referred to as the cross entropy loss. (see entropy)

When recast this way, the logistic regression can be seen to be a simple neural network with a single layer. MOst neural network training algorithms use gradient descent as the optimization routine. (as opposed to Newtons method above)