CompSci 100E Spring 2011: Boggle

Prof. Astrachan fondly recalls neighborhood Boggle^® games from his childhood - long before the existence of personal computers or electric lights - and was re-introduced to it on Sun Unix workstations as a grad student in the mid 80's. We used Boggle^® for a Compsci 108 assignment at Duke in 1996. Boggle is in the Nifty Assignments Archive and the current instantiation of the assignment is a combination of efforts from Compsci educators at Duke, Stanford, and UCSD. This version emphasizes empirical analyses of tradeoffs in different implementations.

The Wikipedia Boggle entry explains the game in detail including popular culture references to Boggle. Reviewing the rules on the Wikipedia site will help in understanding the play-of-the-game, but you'll be looking at implementation trade-offs that are independent of game-playing. You can play games based on words that are similar to Boggle online, e.g., Facebook's Scramble and Yahoo! Word Racer both of which are definitely related to Boggle.

Theoretical and practical approaches to word games have formed the basis for The World's Fastest Scrabble Program (Appel and Jacobson) for recognizing Genomic Signatures in DeBruijn Chains (Heath and Pati) and for An efficient interface between continuous speech recognition and language understanding (Oerder and Ney).

Analyzing alternative implementations and algorithms will help inform a better understanding of both work and play, what a great way to build a foundation in understanding how to turn data into information.

In particular you'll be examining alternate implementations of a lexicon or word source. You'll work on a Trie implementation and a compressed Trie that's similar to a Radix Trie/Patricia Trie, but simpler to implement. Tries are the data structure behind predictive text used in cell-phone texting, e.g., in T9 Text Input and have formed the basis for IP-routing, e.g., see Keith Sklower's Tree-Based Packet Routing Table for Berkeley Unix although changes to IPv6 may change the utility of tries as described here: Scalable high speed IP routing lookups (Waldvogel, Varghese, Turner, Plattner).

You'll use different word-sources or lexicons. Apparently there's a controversy as to what the "best" lexicon is for playing word games like Scrabble® and Boggle®

What You Will Do

Implementing Lexicons

Details about the classes you write for this part of the assignment and help in writing them are provided in the howto pages for implementing lexicons.

You'll design and code three classes for this part of the assignment and you'll analyze their performance empirically. You're given a JUnit testing class, TestLexicon, to test your implementations --- see the howto pages for help/reminders on using JUnit. In the load methods you write you can assume no duplicate words will be inserted via either the Scanner or ArrayList parameters to the load methods. You're also given a class LexiconBenchmark you can use to analyze the empirical performance of your implementations.

In Boggle®, legal words are those found in the lexicon associated with a game. A lexicon is simply a list of words, it's not a dictionary because it doesn't have associated definitions for each word.

The ILexicon interface specifies the methods exported by the interface which implementations must provide. You're given an implementation SimpleLexicon with an O(n) implementation of the method wordStatus since the implementation simply does a linear search over the list of words it stores. Read the documentation for details of the interface. You must write two new, more efficient implementations and modify SimpleLexicon so that it's wordStatus method executes more efficiently.

Find Words on the Board

You'll implement the method cellsForWord in a class GoodWordOnBoardFinder using a standard, backtracking search to find a word on a Boggle board.

Details and more specifics for this part of the assignment are provided in the howto pages in the section on the IWordOnBoardFinder interface. Those pages also include hints on how to write the class described here.

Implementing AutoPlayers

You'll implement two classes that find all the words on a Boggle board. Each class uses a different technique and you'll analyze the runtime tradeoffs in these techniques. You'll also reason empirically about the performance of these two classes when they're configured with different implementations of ILexicon lexicons.

One class, BoardFirstAutoPlayer looks at the board and tries to form words by trying all the paths on the board. The code will be very similar to the backtracking code you wrote for the GoodWordOnBoardFinder class, but you'll prune searches early based on prefixes as described in the howto pages. Another class LexiconFirstAutoPlayer will be relatively simple to implement since you'll have working lexicons and a working WordOnBoardFinder from earlier in this assignment. In implementing the LexiconFirstAutoPlayer you'll write code to look up every word in the lexicon to see if it's on the board. This method is surprisingly fast enough for a game of Boggle , but it's probably not fast enough to run 10,000 times without waiting for a while.

Statistical and Empirical Analyses of Boggle

You need to report in your README file information about which Lexicon implementation is fastest. You should compare all the lexicons and report on their relative times --- you shouldn't simply say "this one is fastest". You should have at least three lexicons to test. You should explain the methods you used to determine this and report on experiments you run, giving times.

You must write code to play lots of auto-games, not games in which humans play, but games in which all words are found on thousands of boards --- see BoggleStats for a starting point. You must provide a board that scores the highest of all the 4x4 and 5x5 boards you test in running at least 10,000 auto-games. and preferably 50,000 games. Report on how many seconds it takes your code to run for 1,000 games; for 10,000 games (or predict that if you can't run that many); and predict/justify on how much time it would take your computer and implementation to simulate both 100,000 games and one million games. When doing the experiments be sure to set the random-number generator's seed, currently done already in BoggleStats and described in the howto pages. If you can't run 10,000 auto-games, indicate the maximum number you can run.

Grading

Your README file should list your testing results, all the people with whom you collaborated, and the TAs/UTAs you consulted with. You may want to include your testing and analysis results in a separate analysis document.

You should include an estimate of how long you spent on the program and what your thoughts are about the assignment.

Submit your README and all of your source code using Eclipse with assignment name boggle.

Boggle®