Compsci 101, Fall 2012, Regex FUN, November 19

Name____________________   net-id _________     Name____________________   net-id _________       

Name____________________   net-id _________     Name____________________   net-id _________       
This downloadable tool will help as you experiment with regular expressions.

  1. The expression ate$ matches 540 words and the expression ^s.*ate$ matches 30 words. What are the differences and similarities (written description) between the matches of these two expressions?
    
    
    
    
  2. The expression ^p.[o|a].e$ matches 14 words, what features do these words have in common?
    
    
    
    
    
  3. The regular expression (....)\1 has one match, the word beriberi. A small change to the regex, (....).\1 generates two matches: bandstands and hodgepodge. Explain why beriberi doesn't match the second expression and why the two words that match the second expression match it and not the first regex.
    
    
    
    
    
  4. The regex (....).*\1 has 13 matches:
    atherosclerosis 
    bandstands 
    beriberi 
    hodgepodge 
    kinnickinnic 
    knickerbocker 
    knickerbockers 
    lightweight 
    misunderstander 
    misunderstanders 
    nationalization 
    rationalization 
    rationalizations 
    
    
    Explain why atherosclerosis matches. Circle the five of these that also match (.....).*\1 (there's one more dot).

  5. Find all the words that have all the vowels a,e,i,o,u and in that order.
    
    
    
    
    
  6. Find the number of words that contain either "spis" or "spas" in them. Ideally you'll do this using one regex. What is the regex?
    
    
    
    
    
  7. Find all seven letter palindromes using one regex.
    These questions are about the code in SimpleGrammar.py, some code reproduced below.

  8. Which is the best explanation of the body of code in the statement if w.startswith("<") in the function expand below: def expand(sentence,rules): sent = "" for w in sentence.split(): if w.startswith("<"): chosen = random.choice(rules[w]) sent += expand(chosen,rules) +" " else: sent += w + " " return sent.strip()
    1. the rule chosen as a replacement for w may require expansion because it has tags in it, so the rule is passed to expand in case it's more than a simple word.

    2. because the word w starts with a < symbol we know a choice should be made to replace it, but the line assigning to sent could be replaced with: sent += chosen + " "

    3. the parameter rules is a dictionary, accessing the dictionary generates a random replacement for the key w and that replacement also starts with a < symbol.

  9. Which is the best explanation of why sent.strip() is returned rather than simply sent?

    1. all strings must be stripped in Python to ensure they can be printed.

    2. The string has an extra space at the end because a space is always the last thing concatenated to sent in the for-loop.

    3. All white-space should be removed from sent, not just leading and trailing white-space.

  10. If we want a combined-color like yellow-green or blue-red to be a possible color which string should be added to the list colors in the function create_content?

    1. "<color> - <color>"

    2. "yellow-green"

    3. "yellow - <color>"