Compsci 06/101, Spring 2011, Lab 12

You should snarf the lab12 files for this lab or browse the code directory for code files provided.

The Random Sentence Generator was one of the original (1999) SIGCSE/Nifty Assignments. This version uses regular expressions for parsing and so is more straightforward than the original.

There are many examples of randomly-generated text, graphics, art, and so on. The ones referenced here all use context free grammars like those you'll be using in this assignment. The AD Generator combines slogans with Flickr images to create random ads built on real slogans. The famous SCIgen project generates random computer science papers including those that were actually accepted for publication, albeit in shady conference venues. The site has videos and a complete description of the history of SCIgen.

Context Free Art includes information on generating different images using grammars and computerized drawing. Grammars known as L-systems have also successfully modeled plant formation.

For example, here are some Duke Compsci excuses generated by this grammar.

I can't believe I haven't started working on this week's APT assignment. The problems were unbelievably hard and I couldn't find my computer .
I finished working on this week's APT assignment. The problems were like trivial and Eclipse crashed .
I gave up working on this week's APT assignment. The problems were really , really , so impossible and I got unbelievably sick .
I gave up working on this week's APT assignment. The problems were so , like easy and I had a midterm .
I finished working on this week's APT assignment. The problems were like trivial .

You'll do three things for this lab:

Create a grammar that you upload to the Compsci06/Spring 11 grammar website so that it can be used by everyone in the class. The class will vote on the best grammar in several categories, extra points for top-placing grammars. You submit the grammar online using grammar website, but you can browse the grammars too. You can browse the Fall 10 grammars too.
Answer questions about regular expressions in the context of parsing the grammars (see the handin pages).
Answer questions about using URLs and files to read grammars (see the handin pages).

First answer the questions on the handin pages, then create and upload a grammar. We'll vote on grammars next week and you'll modify the RSG program as part of the next assignment. You might want to read about grammars below first to understand the terminology used.

Then create a grammar and upload it to the course website. Verify that your grammar is there by loading it via the URLreader.py module.

Grammar Background

In Computer Science a grammar is basically a set of rules. Grammars can be used to recognize and validate text, e.g., a grammar from a programming language is used to parse, compile, or interpret programs written in the language. Grammars can also be used in a generative mode, to generate texts or images or sounds that conform to the rules of the grammar.

The format of the grammar used in this assignment is described briefly here, but you can reason by example from the apt-issues.g file or by browsing submitted grammars what the grammar looks like.

A grammar processed by your program consists of a collection of definitions and rules for each definition. Each definition must is enclosed by curly-braces. The definition consists of the non-terminal being defined followed by the rules for that definition. Random text is always generated beginning with the non-terminal <start> as can be seen in the examples shown above generated by this grammar. In the grammar we treat each non-terminal between curly-braces as a definition (that's the word used). Each expansion for the non-terminal is separated from others by a semi-colon. These expansions are called productions.

{ <start> I <status> working on this week's APT assignment. The problems <description> .; } { <status> gave up ; finished ; am still ; can't believe I haven't started ; } { <description> were <adjective> <difficult> ; were <adjective> <difficult> and <excuse>; } { <adjective> really ; unbelievably ; so ; like ; <adjective> , <adjective> ; } { <difficult> hard ; impossible ; easy ; trivial ; } { <excuse> I had a midterm ; I got <adjective> sick ; I couldn't find my computer ; Eclipse crashed ; <excuse> and <excuse> ; }

By examining the randomly generated examples you can see how sometimes a string of adjectives is generated, e.g., like, really, really, so, unbelievably. In theory the length of this sequence of adjectives, generated by repeatedly choosing the last of the rules for the non-terminal <adjective>, could be arbitrarily long, but in practice choosing this rule happens with probability 0.2 (1/5) so choosing it repeatedly isn't too likely.

Some non-terminals, like <difficult> and <status> don't result in more rules/definitions being chosen. But the others do generate more choices and texts since the rules associated with the definitions also have non-terminals in them.