Introduction to Computer Science
CompSci 101 : Fall 2013
Grammars are a set of rules used in recognizing and generating text (or music or art or ...). In this lab you will use context free grammars to generate (hopefully amusing) random text.

There are many examples of randomly-generated text, graphics, art, and so on. The ones referenced here all use context free grammars. The AD Generator combines slogans with Flickr images to create random ads built on real slogans. The famous SCIgen project generates random computer science papers including those that were actually accepted for publication, albeit in shady conference venues. The site has videos and a complete description of the history of SCIgen.

Context Free Art includes information on generating different images using grammars and computerized drawing.

The Random Sentence Generator was one of the original (1999) SIGCSE/Nifty Assignments. This current version uses regular expressions for parsing and so is more straightforward than the original.

For example, here are some Duke Compsci excuses generated by this grammar.

I can't believe I haven't started working on this week's APT assignment. The problems were unbelievably hard and I couldn't find my computer .

I finished working on this week's APT assignment. The problems were like trivial and Eclipse crashed .

I gave up working on this week's APT assignment. The problems were really , really , so impossible and I got unbelievably sick .

I gave up working on this week's APT assignment. The problems were so , like easy and I had a midterm .

I finished working on this week's APT assignment. The problems were like trivial .

Specifications

You will do three things for this lab — but please read the section below on grammars before starting.

  1. Create your own grammar to get an understanding of how the grammar is organized and formatted.

  2. Answer questions about regular expressions in the context of parsing the grammars. These questions are designed to help you understand how the grammar is parsed and stored as a data structure within Python.

  3. Update the given module, rsgModel, to generate random sentences/texts based on a given grammar by completing the function generate.

You can snarf the starting files for this assignment or view them here.

Grammar Background

The format of the grammar used in this assignment is described briefly here, but you can reason by example from the apt-issues.g file or by browsing submitted grammars what the grammar looks like.

A grammar processed by your program consists of a collection of definitions and rules for each definition.

Random text is always generated beginning with the non-terminal <start> as can be seen in the examples shown above generated by this grammar.

{ <start> I <status> working on this week's APT assignment. The problems <description> .; } { <status> gave up ; finished ; am still ; can't believe I haven't started ; } { <description> were <adjective> <difficult> ; were <adjective> <difficult> and <excuse>; } { <adjective> really ; unbelievably ; so ; like ; <adjective> , <adjective> ; } { <difficult> hard ; impossible ; easy ; trivial ; } { <excuse> I had a midterm ; I got <adjective> sick ; I couldn't find my computer ; Eclipse crashed ; <excuse> and <excuse> ; }

Some non-terminals, like <difficult> and <status> do not result in more rules/definitions being chosen. But the others do generate more choices and texts since the rules associated with the definitions also have non-terminals in them.

By examining the randomly generated examples you can see how sometimes a string of adjectives is generated, e.g., like, really, really, so, unbelievably. In theory the length of this sequence of adjectives, generated by repeatedly choosing the last of the rules for the non-terminal <adjective>, could be arbitrarily long, but in practice choosing this rule happens with probability 0.2 (1/5) so choosing it repeatedly is not too likely.

Consider this example, we will walk through how it is generated.

I finished working on this week's APT assignment. The problems were like trivial and Eclipse crashed .

Grammar Data Structure

One of the most important things in Computer Science is being able to transform specially formatted files, from simple ones like CSV files to complex ones like grammars or web pages, into structured collections within your program, from simple lists to complex ones like lists of lists of values or dictionaries. The code given to you for this project does just that: it converts a formatted grammar file into a dictionary where the key is a string, the non-terminal, and the value associated with the key is a list of the rules that can be used when the non-terminal is expanded.

When run by itself, the module rsgModel, loads the example grammar file, apt-issues.g, and prints out its keys and values for your inspection. You should study the output of that module when it is loaded to make sure you understand how it is organized.

Internally a dictionary is used in which the key is a non-terminal and the value corresponding to the non-terminal is a list of rules, each rule is a list. For example, loading the grammar file generates this dictionary entry:

<status> ['gave', 'up'] ['finished'] ['am', 'still'] ["can't", 'believe', 'I', "haven't", 'started']

You can see that the non-terminal <status> has four rules with a varying number of words in each rule. Because there can be several rules, the dictionary uses a list of lists of strings to make it easy to process each word (in case one might be a non-terminal).