Week 9, 3/25 & 3/28

Setup

You may work in pairs on this lab. Snarf the lab/boggle project or browse the code directory.

Overview

The game Boggle® involves finding all words on a board. entry explains the game in detail including popular culture references to Boggle. You can play games based on words that are similar to Boggle online such as Yahoo! Word Racer.In Boggle®, legal words are those found in the lexicon associated with a game. A lexicon is simply a list of words. It's not a dictionary because it doesn't have associated definitions for each word.

The next set of labs considers some of the trade-offs involved with building Boggle® players and game. In particular, this lab examines alternate implementations of a lexicon or word source.

You will write implement and analyze classes that implement the ILexicon interface. Each lexicon implementation facilitates looking up a word or a possible prefix of a word. The classes differ in how efficiently they support these operations and in how much memory they use. You are given a JUnit testing class, TestLexicon, to test your implementations. In the load methods you write you can assume no duplicate words will be inserted via either the Scanner or ArrayList parameters to the load methods. You are also given a class LexiconBenchmark you can use to analyze the empirical performance of your implementations.

You're given an implementation SimpleLexicon with an O(n) implementation of the method wordStatus since the implementation simply does a linear search over the list of words it stores. Read the documentation for details of the interface. You will modify SimpleLexicon so that it's wordStatus method executes more efficiently, create a BinarySearchLexicon class, and complete the TrieLexicon class.

  1. SimpleLexicon

    Rewrite the wordStatus methods of the SimpleLexicon class so that it uses the TreeSet.subSet method that returns a subset/subtree of the tree in which all the words are stored --- the subset's first word should be the String whose status is being determined in wordStatus and the subset's last word should be myUpperBound which is a value larger than any element in the lexicon's set of words (see the code). Make sure you check to see if the parameter s is larger than the upper bound since that will cause the TreeSet.subset call to fail. See the SortedSet.subset API for details.

  2. BinarySearchLexicon

    You must create a class named BinarySearchLexicon implementing the ILexicon interface. Store words in a sorted ArrayList (sort the ArrayList after adding all the words to it) and use Collections.binarySearch to search the list.

    Read the documentation for binarySearch. Note that when the index value returned is less than zero the value can be used to determine where a word should be in a sorted list. For example, when looking up "ela" the value returned might be -137. This means that "ela" is not in the lexicon, but if it were to be inserted it would be at index 136. This also means that if the word at index 136 does not begin with "ela" then no word in the lexicon has a prefix of "ela". So, any non-negative value returned by binarySearch means the status of a word is LexStatus.WORD. If the value returned is negative, one call of the appropriate String.startsWith() method can determine if LexStatus.PREFIX should be returned (make sure you don't go off the end of the array o[5~f words in the lexicon when calling startsWith).

  3. TrieLexicon

    A trie stores words and supports add/search in O(w) time, where w is the length of the word. The number of total words stored in the trie doesn't matter, so looking up the word "apple" requires basically 5 operations regardless of whether the trie stores 100 words or 1,000,000 words. The picture below shows a trie containing the words that follow.
      do, dog, dogma, ,dot, dote,
      ha, we, wed, wept
    
    The black dot in each node indicates a isWord field having value true in some node; if there is no black dot the isWord field is false.

    trie

    The TrieLexicon class uses a trie data structure to implement the ILexicon interface.

    1. If the word "hopefully" is inserted into an empty trie then nine nodes are created. If after this insertion the words "hop", "hopeful", and "hope" are inserted, in that order, how many new nodes are created? Why?
      
      
      
    2. If after inserting the words in the previous problem the words "hopes" and "hoping" are inserted, how many new nodes are created? Why?
      
      
      
    3. Complete the wordStatus method for TrieLexicon.
  4. TestLexicon

    We provide a JUnit testing class TestLexicon to use as you develop your ILexicon implementations. To test different implementations simply change the code in the method makeLexicon to return the implementation you want to test and run the JUnit tests.

    We also provide a benchmarking class LexiconBenchmark that facilitates evaluating the efficiency of different implementations as well as correctness. Confidence in an implementation's correctness is increased if it returns the same results as other implementations.


Last modified: Thu Mar 24 23:45:59 EDT 2011