Compsci 06/101, Spring 2011, Lab 13

By entering your name/net-id below you indicate you are present for Lab 13 to answer these questions and that you were part of the process that resulted in answers being turned in.
Name: ______________    Net id: _____________ || Name: ______________    Net id: _____________

Name: ______________    Net id: _____________ || Name: ______________    Net id: _____________

    Part I

    These questions refer to the file short.txt which is:

    The big dog ate food. The big dog died. 
    
  1. In an order-2 Markove process each two-letter sequence is the key in a dictionary. The value associated with the key is a list of characters that follow that two-letter sequence in the file. For example, the two-letter sequence 'at' is followed by an 'e' once. The two-letter sequence ' d' (space-'d') occurs three times, so the corresponding value is ['o', 'o', 'i']. Explain why there are two o's and one i in the list associated with key ' d'.
    
    
  2. In the same file there are three occurrences of 'e ' ('e'-space) what is the three letter-list associated with this key?
    
    
  3. Given a string text, describe the contents of the list generated by the list comprehension below. Use short.txt if that helps explain the comprehension.
       [ text[i:i+3] for i in range(0,len(text)) ]
    
    
    
  4. Given the same string text what are the contents of the list comprehension shown.
       words = text.split()
       [words[i:i+3] for i in range(0,len(words))]
    
    
    
  5. For an order-2 Markov process we need to take the three-letter substrings and turn them into keys and values. If we have a list of all three-letter substrings:
       ['The', 'he ', 'e b', ' bi', 'big', 'ig ', 'g d', ' do' ... ]
    
    We create a dictionary using 'Th', 'he', 'e ', ' b', 'bi', etc. as keys. The letter that follows these keys, which is the last letter of the three letter sequence, is in the list that's the value associated with each key, e.g., as described before we have " d" : ['o','o','i']. Complete the code below to populate the dictionary. def make_dictionary(triples): """ triples is a list of three-letter substrings, return a dictionary of two-letter keys with corresponding value a list of following-letters """ d = {} for trip in triples: key = # fill in this line if key not in d: # fill in this line d[key].append(trip[-1]) # explain this line
  6. What part of the code you wrote above depends on the number 3? Can you avoid any such dependencies (to work with order-4, or order-6 Markov processes? How?
    
    
    

    Part II

    For this part you'll be working on the function generate_text that's documented in the module Markov.py and described in the lab handout.

  7. What's the code to choose a random key from the dictionary and assign this to variable seed?
    
    
    

  8. Create a local string variable text in generate_text to add characters to one at a time. Concatenate size characters chosen at random from the alphabet to text, e.g., use
      next = random.choice("abcdefghijklmnopqrstuvwxyz")
    
    and verify that the program works by running it to see that all the functions call each other properly. This is an order-0 Markov process since there is no prediction, just random letters chosen.
    
    
    
  9. Modify the program to choose a random letter from the list associated with key seed in the dictionary and that to text instead of using a random letter from the alphabet. What's the line you wrote:
      next = 
    
    
  10. Explain in words why the code below will work to create a new seed as described in the lab hand out and why this will work even for an order 8 (or any order) Markov process.
       seed = seed[1:] + next
    
    
    

    Part III

  11. Make the program work for an order-K Markov process by creating one new parameter to make_substrings. Describe what this parameter is and how to modify the code to work for order-3 or order-5 or order-K Markov processes.