To specify a unigram language model, we need a big dictionary that
maps words to probabilities (good data structure?).
Given a unigram language model, what might we want to do?
- Generate random sentences respecting the probabilities
in the model (interesting to try to solve efficiently).
- Given a sentence, what is its probability?
- Given a sequence of letters, what is the most likely sentence
consistent with the sequence?
- Given a sequence of letters, what is the total probability of all
sentences consistent with the sequence?
Next: Probability of a Segmentation
Up: SEGMENTATION
Previous: Unigram Idea