Unigram Algorithms

To specify a unigram language model, we need a big dictionary that maps words to probabilities (good data structure?).

Given a unigram language model, what might we want to do?

Generate random sentences respecting the probabilities in the model (interesting to try to solve efficiently).
Given a sentence, what is its probability?
Given a sequence of letters, what is the most likely sentence consistent with the sequence?
Given a sequence of letters, what is the total probability of all sentences consistent with the sequence?

Next: Probability of a Segmentation Up: SEGMENTATION Previous: Unigram Idea