CompSci 101 : Spring 2014

Generating Random Sentences

Grammars are a set of rules used in recognizing and generating text (or music or art or ...). There are many examples of randomly-generated artifacts. The ones referenced here all use context free grammars like those you will be using in this lab. The AD Generator combines slogans with Flickr images to create random ads built on real slogans. The famous SCIgen project generates random computer science papers including those that were actually accepted for publication, albeit in shady conference venues. Context Free Art includes information on generating different images using grammars and computerized drawing. Grammars known as L-systems have also successfully modeled plant formation.

One of the most important things in Computer Science is being able to transform specially formatted files, from simple ones like CSV files to complex ones like grammars, web pages, or programs, into structured information. In this lab you will use context free grammars to generate (hopefully amusing) random text. This exercise is intended to give you practice writing specially formatted files since that is primarily what you will be doing the rest of the semester (either a program or data file).

For example, here are some Duke CompSci excuses generated by this grammar.

I can't believe I haven't started working on this week's APT assignment. The problems were unbelievably hard and I couldn't find my computer .
I finished working on this week's APT assignment. The problems were like trivial and Eclipse crashed .
I gave up working on this week's APT assignment. The problems were really , really , so impossible and I got unbelievably sick .
I gave up working on this week's APT assignment. The problems were so , like easy and I had a midterm .
I finished working on this week's APT assignment. The problems were like trivial .

The code given to you for this project does just that: it converts a formatted grammar file into a dictionary where the key is a string, the name, and the value associated with the key is a list of the rules that can be used when the definition is expanded.

Specification

Create a grammar in the format described below to be used to generate random sentences. Submit your grammar online using the course's grammar website where it can be seen by everyone in the class. You can upload your grammar as many times as you like, but only the last one will be saved. The class will vote on the best grammar in several categories and extra points given for top-placing grammars. You can browse the (unsorted) Fall 12 grammars as well.

Your grammar should be a plain text file, either using your system's text editor (i.e., NotePad or TextPad) or within Eclipse by choosing File -> New -> Untitled Text File.

A grammar consists of a collection of definitions and rules for each definition. The format of the grammar used in this lab is described here and you can reason by example from the file apt-issues.g.

Each definition is enclosed by curly-braces.
A definition consists of the name being defined, on a line by itself, followed by the rules for that definition.
Each rule in a definition is separated from other rules by a semi-colon.
A rule contains one or more words. If a word is enclosed by angle-brackets, it refers to the name of a definition and is not printed directly. Instead it is replaced by one of the rules in its definition.

In our example grammar, some definitions, like <difficult> and <status> do not result in more definitions being expanded. But others do generate more choices and texts since the rules associated with the definitions refer to other definitions within them.

Random text is always generated beginning with the definition <start>.

Let's walk through an example to see how this sentence is generated by showing the steps as each definition is expanded:

<start>
(<start> is the starting point of the grammar)

I <status> working on this week's APT assignment. The problems <description> .
(by replacing <start> with its only rule)

I finished working on this week's APT assignment. The problems <description> .
(by replacing <status> with the rule: finished)

I finished working on this week's APT assignment. The problems were <adjective> <difficult> and <excuse> .
(by replacing <description> with its second rule: were <adjective> <difficult> and <excuse>)

I finished working on this week's APT assignment. The problems were like <difficult> and <excuse> .
(by replacing <adjective> with the rule: like)

I finished working on this week's APT assignment. The problems were like trivial and <excuse> .
(by replacing <difficult> with the rule: trivial)

I finished working on this week's APT assignment. The problems were like trivial and Eclipse crashed .
(by replacing <excuse> with the rule: Eclipse crashed)

By examining the randomly generated examples you can see how sometimes a string of adjectives is generated, e.g., like, really, really, so, unbelievably. In theory the length of this sequence of adjectives, generated by repeatedly choosing the last of the rules for the definition <adjective>, could be arbitrarily long, but in practice choosing this rule happens with probability 0.2 (1/5) so choosing it repeatedly is not too likely.

While you do not need to look at the code to generate random sentences, and you certainly are not expected to understand it at this point in the semester, it is available here in case you are interested.

The Random Sentence Generator was one of the original (1999) SIGCSE Nifty Assignments.