Compsci 6, Dictionary FUN March 15

Name____________________   net-id _________       Name____________________   net-id _________       

Name____________________   net-id _________       Name____________________   net-id _________       

    fingerprint2_py

    In fingerprint2_py two functions: slow_fingerp and fast_fingerp do similar tasks: count the number of times every word in the parameter datasource occurs and return a structure of pairs. The slow_fingerp function returns a list of lists, each inner list is like ["apple", 57] indicating the word "apple" occurs 57 times in the datasource passed as a parameter. So the function might return the list below for a data source that contains the three words occuring as many times as shown.
       [ ["ant", 15], ["bat", 3], ["dog", 8] ]
    
    
    The function fast_fingerp returns a dictionary, we'll explain that after you've answered some questeions about the code.

  1. What line is used to open a file rather than a URL in the function benchmark?

    1. source = urllib.open(name)

    2. source = open(name)

    3. source.close()

  2. What's the name of the module that facilitates finding the current time?

    1. urllib

    2. time

    3. std_time

  3. What is the purpose of the boolean varaible found initialized of False in the outerloop that sets word to each word found in the datasource?

    1. it indicates if the datasource is found online

    2. it indicates if the word currently being processed in the loop has been processed before

    3. it indicates if the word occures as the first element of stats
  4. Which line is executed in slow_fingerp the first time a word in the datasource is found.

    1. pair[1] += 1

    2. stats.append([word,1])

    3. word = word.lower

  5. It takes 0.16 seconds to process melville.txt which has 4,103 unique words and about 14,000 total words on ola's laptop. If this file is copied onto itself so it's twice as big (available as melville2.txt), but just duplicated twice then it takes 0.34 seconds to process: still 4,103 unique words but now about 28,000 total words. For a file that's four melville's concatenated together it takes 0.52 seconds, this time about 56,000 words, still 4,103 unique words (available as melville4.txt).

    About how long to do melville8.txt based on these numbers?

    1. 1 second

    2. 2 seconds

    3. 3 seconds

  6. In finger_print2.py the function below returns the most frequently occurring word/count pair in data, a list of two-element lists, e.g., [['the',45],['cat',13],['dog',9]]

    Which is the best explanation for why this function works, i.e., why it returns the word/count pair that occurs most often?

    def max_list(data):
        return sorted([(elt[1],elt[0]) for elt in data])[-1]
    

    1. lists are sorted alphabetically by word, sorting puts the last word alphabetically at the end, and this word is returned.

    2. lists are sorted lexicographically, since elt[1] represents the count and is the first element of each tuple in the list being sorted the last tuple in the list is the largest number of occurrences, this last tuple is returned

    3. the function returns a tuple that's the most-frequently occurring word because the list passed to the function is already ordered by frequency of occurrence.

Dictionary Basics

In a Python console the following appears illustrating some of the methods that work with dictionaries, in particular there's a variable d that stores a dictionary as shown. What the user types is shown in italics.
>> d
{'duke': 50, 'columbia': 30, 'stanford': 20}
>>> d.keys()
['duke', 'columbia', 'stanford']
>>> d.values()
[50, 30, 20]
>>> d.items()
[('duke', 50), ('columbia', 30), ('stanford', 20)]
>>> [x[1] for x in d.items()]
[50, 30, 20]
  1. If the x[1] in the last line is replaced by x[0] what is printed?

    1. [20, 30, 50]

    2. ['duke', 'duke', 'duke']

    3. ['duke', 'columbia', 'stanford']

  2. After the user types d['duke'] = 80, what is printed by the expression d.values()?

    1. [50, 30, 20]

    2. [80, 30, 20]

    3. [50, 80, 30]

  3. The code below is executed next (after the value associated with 'duke' is changed to 80), what is printed?
    for name in d:
        d[name] += 10
    print d
    

    1. {'duke': 90, 'columbia': 40, 'stanford': 30}

    2. {'duke': 90, 'columbia': 30, 'stanford': 20}

    3. {'duke': 90, 'columbia': 40, 'stanford': 20}