Count Based Models

Overview


A count based model constructs a set of conditional probabilites of a sequence of words based on the number of times each sequence appears in the training documents. The count based model specifies a sequence length, and then computes the counts of each sequence of the chosen length.

AS a simple example, consider a count based model trained on 3 word sequences in a given document. Lets say that the model is asked to predict the next word after seeing "maximum likelihood". The algorithm will count the number of times that a 3 word sequence that starts with the words "maximum likelihood" occurs in the training documents. Next, it tracks the next word in each sequence and computes the frequency of each word. For each subsequent word, the number of times it appears divided by the number of sequences starting with "maximum likelihood" represents the estimated probablity.

Topics