Complete the handin Questions for lab.
There are many examples of randomly-generated text, graphics, art, and so on. The ones referenced here all use context free grammars like those you'll be using in this assignment. The AD Generator combines slogans with Flickr images to create random ads built on real slogans. The famous SCIgen project generates random computer science papers including those that were actually accepted for publication, albeit in shady conference venues. The site has videos and a complete description of the history of SCIgen.
Context Free Art includes information on generating different images using grammars and computerized drawing. Grammars known as L-systems have also successfully modeled plant formation.
The Random Sentence Generator was one of the original (1999) SIGCSE/Nifty Assignments. This current version uses regular expressions for parsing and so is more straightforward than the original.
For example, here are some Duke Compsci excuses generated by this grammar.
I can't believe I haven't started working on this week's APT assignment. The problems were unbelievably hard and I couldn't find my computer .I finished working on this week's APT assignment. The problems were like trivial and Eclipse crashed .
I gave up working on this week's APT assignment. The problems were really , really , so impossible and I got unbelievably sick .
I gave up working on this week's APT assignment. The problems were so , like easy and I had a midterm .
I finished working on this week's APT assignment. The problems were like trivial .
First answer the questions on the handin pages, then create and upload a grammar. We'll vote on grammars next week and you'll modify the RSG program as part of the next assignment. You might want to read about grammars below first to understand the terminology used.
Then create a grammar and upload it to the course
website. Verify that your grammar is there by loading it
via the URLreader.py
module.
The format of the grammar used in this assignment is described briefly here, but you can reason by example from the apt-issues.g file or by browsing submitted grammars what the grammar looks like.
A grammar processed by your program consists of a collection of definitions and rules for each definition.
Random text is always generated beginning with the non-terminal <start> as can be seen in the examples shown above generated by this grammar.
Some non-terminals, like <difficult> and <status> don't result in more rules/definitions being chosen. But the others do generate more choices and texts since the rules associated with the definitions also have non-terminals in them.
By examining the randomly generated examples you can see how sometimes a string of adjectives is generated, e.g., like, really, really, so, unbelievably. In theory the length of this sequence of adjectives, generated by repeatedly choosing the last of the rules for the non-terminal <adjective>, could be arbitrarily long, but in practice choosing this rule happens with probability 0.2 (1/5) so choosing it repeatedly isn't too likely.
Consider this example, we'll walk through how's it's generated.
I finished working on this week's APT assignment. The problems were like trivial and Eclipse crashed .
<start>
, so it is chosen for expansion. The
rule
is:
Expanding means looping over each "word" and expanding the word.
<status>
" is a non-terminal, and is generated in the same manner that <start>
is currently being expanded:
<status>
is chosen randomly, in this case, the second one.
<description>
" is a non-terminal, and is generated in the same manner that <start>
is currently being expanded:
<description>
is chosen
randomly, in this case, the second one (yes that is possible
:). The rule chosen is
<adjective>
" is a non-terminal, and is generated in the same manner that <start>
and <description>
are currently being expanded:
<difficult>
" is a non-terminal, and is generated in the same manner that <start>
and <description>
are currently being expanded:
<excuse>
"is a non-terminal, and is generated in the same manner that <start>
and <description>
are currently being expanded:
One of the most important things in Computer Science is being able to transform specially formatted files, from simple ones like CSV files to complex ones like grammars or web pages, into structured collections within your program, from simple lists to complex ones like lists of lists of values or dictionaries. The code given to you for this project does just that: it converts a formatted grammar file into a dictionary where the key is a string, the non-terminal, and the value associated with the key is a list of the rules that can be used when the non-terminal is expanded.
When run by itself, the module rsgModel
, loads the
example grammar file, apt-issues.g, and
generates two different "stories" using the grammar. Questions
are in the handin about this.
Internally a dictionary is used in which the key is a non-terminal and the value corresponding to the non-terminal is a list of rules, each rule is a list. For example, loading the grammar file generates this dictionary entry:
You can see that the non-terminal <status>
has four rules
with a varying number of words in each rule. Because there can be several rules, the dictionary uses a list of lists of strings to make it easy to process each word (in case one might be a non-terminal).