Compsci 101, Fall 2017, Lab 11

NOTE: THIS LAB CANNOT BE TURNED IN LATE. It must be turned in by Sunday night, Dec 10 so we can grade it.

The lab

In this lab you will:

There is code to snarf for lab 11, that code is also here.

NOTE: This file RegexDemo.py is slightly different than the one you snarfed for lecture.

Getting Credit for Lab 11

To get credit for this lab, you will need to do the following by Sunday night.



Part 1a: Recursion

First discuss the following mystery function (where lines are numbered) and calls to it. How many recursive calls are there for each call (the original call is not a recursive call)? What is the return value in each case?

1  def mystery(x, y):
2      if x <= y:
3          return 1
4      return 2 + mystery(x-1, y+1)

print mystery(5,5)
print mystery(7,4)
print mystery(10,3)

Answer these questions on the online form.

  1. Which numbered line has the recursive call?
  2. How many recursive calls are there for mystery(5,5) (do not count the original call mystery(5,5))?
  3. What is the output for mystery(5,5)?
  4. How many recursive calls are there for mystery(7,4) (do not count the original call mystery(7,4))?
  5. What is the output for mystery(7,4)?
  6. How many recursive calls are there for mystery(10,3) (do not count the original call mystery(10,3))?
  7. What is the output for mystery(10,3)?

Part 1b : Recursion

Now you will rework one older APT by rewriting it to solve it with simpler recursion - Acronym

If you have already done it, you could comment out the code you have and put in new code that uses recursion.

Just use the Test link to test it out since we will not submit this as an APT.

Answer these questions:

  1. What is the smaller problem to solve?
  2. What is the way out to stop the recursion, the smallest problem that does not use recursion
  3. What is the code for this function that is recursive?

Part 1c : Recursion - Getting all the numbers

Consider the problem of extracting all the numbers out of a list to create a list of numbers, where the original list might have lists inside it. This problem is similar to the problem of listing all the big files in a folder we looked at in lecture.

For example, consider the following list:

[5, [6, 7], 8, [[1], [2, [6]], 7]]

You will write a recursive function named getNums that has one parameter that is a list like this one (that might have lists inside of it). This function returns returns a simple list of the elements from this list in their same order. For example, for the list of lists above, getNums would return the list:

[5, 6, 7, 8, 1, 2, 6, 7]

This code has been started for you in the file GettingAllNumbers.py

Answer these questions:

  1. In processing each list item you will need to know if the item is a number or another list. Describe how you might determine this.
  2. Describe what the recursive part of the problem would be.
  3. What is the way out to stop the recursion, the smallest problem that does not use recursion
  4. Complete the function getNums using recursion and copy your final code here.

Part 2: Regular Expressions

Snarf the code for this part of the lab.

In creating regular expressions you'll need to understand these aspects of regular expressions. You can see more about regular expressions in Python via Python documentation, e.g., https://docs.python.org/2/library/re.html

Answer each question on the online form. You can run module RegexDemo.py which accepts regex expressions and prints those words that match the expression. It reads data from one of two files, one that has a lot of words, with one word per line, and one that is a book.

For example using the word file, entering the regex

^\w\w\wz$

matches 12 words as shown below. There have to be four characters, the first three are any letter and the fourth letter must be z.

benz
buzz
cruz
fuzz
jazz
katz
lutz
quiz
ritz
salz
suez
whiz

Whereas entering

\w\w\wz$

matches 65 words, the last four of which are shown here. In this case there are restrictions on the last four characters but none at the beginning of the string. There must be at least four characters and the last character must be z.

vasquez
velasquez
waltz
whiz

You'll use expressions provided to you and develop expressions to answer questions about understanding regular expressions.

regex purpose
. any character
* zero or more of previous regex
\w any alphanumeric character (and _)
+ one or more of previous regex
\s any whitespace character
*? or +? non-greedy version of either * or +
\d any digit character
() tag/group a regular expression
[] character class, e.g., [A-Z] or [aeiou]
\1, \2, .. match numbered tagged/grouped regex
{n} n occurrences of preceding regex
^ beginning of line/string
[^...] not the characters in the class, e.g., [^aeiou]
$ end of line/string

Part 2a) Answer these questions on the online form.

For this part, use the words.txt file. When you run RegexDemo, select "w".

  1. How many words end in the letter 'a'?

    Answer: there are 898 words that end with 'a', the regex used is a$ and the last word printed is zomba

  2. The regex [aeiou]$ indicates there are 6988 words that end with a vowel. How many words start and end with a vowel?
  3. How many words start and end with the same vowel, e.g., extreme and aorta are two such words?
  4. The words obsequious and pharmacopoeia each contain four vowels in a row. How many words contain at least four consecutive vowels?
  5. 'radar' is a five-letter palindrome, it reads the same from left-to-right as from right-to-left. The regex
    ^(\w)(\w)\w\2\1$ 
    
    shows there are 9 five-letter palindromes. How many seven letter palindromes are there and what are they?
  6. How many seven letter words end with 'ing'?
  7. How many seven letter words start with 's' and end with 'ings'?
  8. There are 7955 words containing 'in', that match either in or (in) for example. The word 'maintaining' contains three occurrences of 'in'. How many words are there that contain three occurrences of 'in'?
  9. How many words contain any two letter sequence repeated three times, not just 'in' as in the previous question. For example, 'anticompetitive' has the sequence 'ti' repeated.
  10. The regular expression ((\w).\2).*\1 matches the 12 words below (and only these words). Explain why these words match, that is how the regular expression works. Note that the expression (\w) is the second tagged expression since the first left parenthesis begins the first tagged expression -- you count tagged expressions by the order in which the left-parenthesis appears in reading left-to-right.

    amalgamate
    amalgamated
    amalgamates
    amalgamatating
    amalgamatation
    assesses
    dereference
    monotonous
    monotonously
    monotonousness
    possesses
    sensitivities
    
  11. Make up your own "word puzzle" and enter it in the lab form. Include the question, the answer, and the regex you developed to answer it.

Part 2b)

For this part, use the book file, littlebrother.txt. When you run RegexDemo, select "b".

  1. Calculate how many unique emails are in the book littlebrother.txt. Enter the regular expression you used and the number. (you can just eyeball this).

  2. Calculate how many web sites start with http: are in littlebrother.txt. Enter the regular expression you used and the number.

  3. Calculate how many words in littlebrother.txt have a number in them (examples: 68, 6A, word2). Enter the regular expression you used and the number.
  4. Calculate how many numbers are in littlebrother.txt (Don't count words that have any symbols that are not digits. 456 is good, but 786-3452 and 6A would not count) Enter the regular expression you used and the number.
  5. Calculate how many words in littlebrother.txt have at least one digit and at least one letter (examples: 63A, word26). Enter the regular expression you used and the number.

Submit

After submitting the lab form, then submit the code you wrote with ambient/websubmit. Use lab11 as the submission folder.

Last Lab

This is the last lab of the semester. Celebrate!