Compsci 101, Fall 2012, Regex FUN, November 19
Name____________________ net-id _________ Name____________________ net-id _________
Name____________________ net-id _________ Name____________________ net-id _________
This downloadable tool
will help as you experiment with regular expressions.
- The expression
ate$ matches 540 words and the
expression ^s.*ate$ matches 30 words. What are the
differences and similarities (written description) between the
matches of these two expressions?
- The expression
^p.[o|a].e$ matches 14 words, what
features do
these words have in common?
- The regular expression
(....)\1 has one match, the
word
beriberi. A small change to the regex, (....).\1
generates
two matches: bandstands and hodgepodge. Explain why
beriberi doesn't match the second expression and why the two words
that match the second expression match it and not the first regex.
- The regex
(....).*\1 has 13 matches:
atherosclerosis
bandstands
beriberi
hodgepodge
kinnickinnic
knickerbocker
knickerbockers
lightweight
misunderstander
misunderstanders
nationalization
rationalization
rationalizations
Explain why atherosclerosis matches. Circle the five of these
that also match (.....).*\1 (there's one more dot).
- Find all the words that have all the vowels a,e,i,o,u and in that
order.
- Find the number of words that contain either "spis" or "spas" in
them. Ideally you'll do this using one regex. What is the regex?
- Find all seven letter palindromes using one regex.
These questions are about the code in
SimpleGrammar.py, some code
reproduced below.
- Which is the best explanation of the body of code
in the statement
if
w.startswith("<") in the function
expand below:
def expand(sentence,rules):
sent = ""
for w in sentence.split():
if w.startswith("<"):
chosen = random.choice(rules[w])
sent += expand(chosen,rules) +" "
else:
sent += w + " "
return sent.strip()
- the rule chosen as a replacement for
w may require
expansion because it has tags in it, so the rule
is passed to expand in case it's more than a simple word.
- because the word
w starts with a < symbol we
know a choice should be made to replace it, but the line
assigning to sent could be replaced with:
sent += chosen + " "
- the parameter
rules is a dictionary, accessing
the dictionary generates a random replacement for the key w
and that replacement also starts with a < symbol.
- Which is the best explanation of why
sent.strip() is
returned rather than simply sent?
- all strings must be stripped in Python to ensure they can
be printed.
- The string has an extra space at the end because a space is
always the last thing concatenated to
sent in
the for-loop.
- All white-space should be removed from
sent, not
just leading and trailing white-space.
- If we want a combined-color like yellow-green or
blue-red to be a possible color which string should be added
to the list
colors in the function
create_content?
- "<color> - <color>"
- "yellow-green"
- "yellow - <color>"