Fitting Transition Matrices
Overview
Once a Markov chain model has been specified, it is often necessary to fit the transition matrix to data. That is,
there exists a set of sequences, each representing a sample point drawn from the distribution represented by the
transition matrix.
The typical method used to determine the matrix elements is the
method of maximum likelihood.
Initial State
The maximum likelihood probability of the first state being state i is given by
{% \pi_i = \frac{N_i}{\sum_i N_i} %}
where {% N_i %} is the number of samples with intitial point in state i.
Transition Probabilities
The maximum likelihood probability of the transition from state i to state j is given by
{% \pi_{i,j} = \frac{N_{i,j}}{\sum_j N_{i,j}} %}
where {% N_{i,j} %} is the number of samples that transitioned from state i to state j
Add One Smoothing
It is often common to add 1 to all the counts listed above. This is done to deal with the situation where
a sample size is small enough to have a zero count for some of the counts.
For example, if there were zero observations of a transition between state i and state j, the mle algorithm would
treat that probability as zero, whereas the modeler may want to assign at least a very small probability to the
transition that was reasonably consistent with the observed data.