Part I
These questions refer to the file short.txt which is:
The big dog ate food. The big dog died.
- In an order-2 Markove process
each two-letter sequence is the key in a dictionary. The value
associated with the key is a list of characters that follow that
two-letter sequence in the file. For example, the two-letter sequence
'at' is followed by an 'e' once. The two-letter sequence '
d' (space-'d') occurs three times, so the corresponding value is
['o', 'o', 'i']. Explain why there are two o's and one i in the
list associated with key ' d'.
- In the same file there are three occurrences
of 'e '
('e'-space) what is the three letter-list associated with this key?
-
Given a string text, describe the contents of
the list generated by the list comprehension below. Use
short.txt if that helps explain the comprehension.
[ text[i:i+3] for i in range(0,len(text)) ]
- Given the same string text what are the contents of the list
comprehension shown.
words = text.split()
[words[i:i+3] for i in range(0,len(words))]
- For an order-2 Markov process we need to take
the three-letter substrings and turn them into
keys and values. If we have a list of all
three-letter substrings:
['The', 'he ', 'e b', ' bi', 'big', 'ig ', 'g d', ' do' ... ]
We create a dictionary using 'Th', 'he', 'e ', '
b', 'bi', etc. as keys. The letter that
follows these keys, which is the last
letter of the three letter sequence, is
in the list that's the value associated
with each key, e.g., as described before
we have " d" :
['o','o','i']. Complete the code
below to populate the dictionary.
def make_dictionary(triples):
"""
triples is a list of three-letter substrings,
return a dictionary of two-letter keys with
corresponding value a list of following-letters
"""
d = {}
for trip in triples:
key = # fill in this line
if key not in d:
# fill in this line
d[key].append(trip[-1]) # explain this line
- What part of the code you wrote above depends
on the number 3? Can you avoid any such
dependencies (to work with order-4, or
order-6 Markov processes? How?
Part II
For this part you'll be working on the
function
generate_text
that's documented in
the module
Markov.py and
described in the lab handout.
- What's the code to choose a random key from
the dictionary and assign this to variable
seed?
- Create a local string variable text in
generate_text
to add
characters to one at a time. Concatenate
size
characters chosen at
random from the alphabet to text, e.g., use
next = random.choice("abcdefghijklmnopqrstuvwxyz")
and verify that the program works by running
it to see that all the functions
call each other properly. This is an
order-0 Markov process since there is no
prediction, just random letters chosen.
- Modify the program to choose a random letter
from the list associated with key
seed in the dictionary and that
to
text instead of using a random letter
from the alphabet. What's the line you
wrote:
next =
- Explain in words why the code below will work
to create a new seed as described in the lab hand out and
why this will work even for an order 8 (or
any order) Markov process.
seed = seed[1:] + next
Part III
- Make the program work for an order-K Markov
process by creating one new parameter to
make_substrings
. Describe what this
parameter is and how to modify the code to
work for order-3 or order-5 or order-K
Markov processes.