Compsci 101, Fall 2012, Lab 12:RSG

You should snarf the lab12 files for this lab or browse the code directory for code files provided.

Complete the handin Questions for lab.

double spiral
(image source and it's grammar)
Grammars are a set of rules used in recognizing and generating text (or music or art or ...). In this lab you'll use context free grammars to generate (hopefully amusing) random text.

There are many examples of randomly-generated text, graphics, art, and so on. The ones referenced here all use context free grammars like those you'll be using in this assignment. The AD Generator combines slogans with Flickr images to create random ads built on real slogans. The famous SCIgen project generates random computer science papers including those that were actually accepted for publication, albeit in shady conference venues. The site has videos and a complete description of the history of SCIgen.

Context Free Art includes information on generating different images using grammars and computerized drawing. Grammars known as L-systems have also successfully modeled plant formation.

The Random Sentence Generator was one of the original (1999) SIGCSE/Nifty Assignments. This current version uses regular expressions for parsing and so is more straightforward than the original.

For example, here are some Duke Compsci excuses generated by this grammar.

I can't believe I haven't started working on this week's APT assignment. The problems were unbelievably hard and I couldn't find my computer .

I finished working on this week's APT assignment. The problems were like trivial and Eclipse crashed .

I gave up working on this week's APT assignment. The problems were really , really , so impossible and I got unbelievably sick .

I gave up working on this week's APT assignment. The problems were so , like easy and I had a midterm .

I finished working on this week's APT assignment. The problems were like trivial .


You'll do three things for this lab --- but please read the section on grammars that comes after these three things before you do these three things.


First answer the questions on the handin pages, then create and upload a grammar. We'll vote on grammars next week and you'll modify the RSG program as part of the next assignment. You might want to read about grammars below first to understand the terminology used.

Then create a grammar and upload it to the course website. Verify that your grammar is there by loading it via the URLreader.py module.


Grammar Background

The format of the grammar used in this assignment is described briefly here, but you can reason by example from the apt-issues.g file or by browsing submitted grammars what the grammar looks like.

A grammar processed by your program consists of a collection of definitions and rules for each definition.

Random text is always generated beginning with the non-terminal <start> as can be seen in the examples shown above generated by this grammar.

{ <start> I <status> working on this week's APT assignment. The problems <description> .; } { <status> gave up ; finished ; am still ; can't believe I haven't started ; } { <description> were <adjective> <difficult> ; were <adjective> <difficult> and <excuse>; } { <adjective> really ; unbelievably ; so ; like ; <adjective> , <adjective> ; } { <difficult> hard ; impossible ; easy ; trivial ; } { <excuse> I had a midterm ; I got <adjective> sick ; I couldn't find my computer ; Eclipse crashed ; <excuse> and <excuse> ; }

Some non-terminals, like <difficult> and <status> don't result in more rules/definitions being chosen. But the others do generate more choices and texts since the rules associated with the definitions also have non-terminals in them.

By examining the randomly generated examples you can see how sometimes a string of adjectives is generated, e.g., like, really, really, so, unbelievably. In theory the length of this sequence of adjectives, generated by repeatedly choosing the last of the rules for the non-terminal <adjective>, could be arbitrarily long, but in practice choosing this rule happens with probability 0.2 (1/5) so choosing it repeatedly isn't too likely.

Consider this example, we'll walk through how's it's generated.

 I finished working on this week's APT assignment. The problems were
 like trivial and Eclipse crashed .

Grammar Data Structure

One of the most important things in Computer Science is being able to transform specially formatted files, from simple ones like CSV files to complex ones like grammars or web pages, into structured collections within your program, from simple lists to complex ones like lists of lists of values or dictionaries. The code given to you for this project does just that: it converts a formatted grammar file into a dictionary where the key is a string, the non-terminal, and the value associated with the key is a list of the rules that can be used when the non-terminal is expanded.

When run by itself, the module rsgModel, loads the example grammar file, apt-issues.g, and generates two different "stories" using the grammar. Questions are in the handin about this.

Internally a dictionary is used in which the key is a non-terminal and the value corresponding to the non-terminal is a list of rules, each rule is a list. For example, loading the grammar file generates this dictionary entry:

<status> ['gave', 'up'] ['finished'] ['am', 'still'] ["can't", 'believe', 'I', "haven't", 'started']

You can see that the non-terminal <status> has four rules with a varying number of words in each rule. Because there can be several rules, the dictionary uses a list of lists of strings to make it easy to process each word (in case one might be a non-terminal).