CompSci 108 : Fall 2009

Arithmetica

Expression Trees are trees in which the terminal nodes (leaves) represent variables or constants in an expression and the non-terminal (internal) nodes represent the operators. The inherent hierarchical nature of trees precisely capture operator precedence and provide a convenient mechanism for storing, traversing, and transforming expressions. Note, that unlike binary trees, each node of an Expression Trees can have any number of children, easily modeling operators with one, two, or even arbitrary numbers of arguments.

Expression Trees were first introduced in 1967 with the programming language LISP, in which the program itself is stored as an expression tree, called an s-expression, that can even be modifed while the program is running! Since then, expression trees have formed the heart of many applications: representing genes, regular expressions, language grammars, and queries. They are so widespread, that the World Wide Web Consortium (W3C) recently decided to use them as the core of the XML Document Object Model (DOM) specification, more than thirty years later!

Thus, the trickiest part about using expression trees is actually creating them from a given formatted input string. Parsing is the process of taking string organized by a grammatical structure (that defines key characters to act as separators between the characters that are important, e.g., whitespace or semi-colon characters in Java) and producing a list of tokens in an order that is easy to convert into an expression tree. During the parsing process, the important sequences of characters are converted into token objects, while the separators are typically thrown out. If the input string contains grammatical errors such that it cannot be parsed, then an error is reported.

Specification

Write an arithmetic interpreter that allows users to enter, change, combine, and evaluate integer arithmetic expressions interactively (such as that provided by Google). A user of your program should be able to enter expressions interactively while your program keeps track of and evaluates them. The user can enter expressions that conform to the types described below. Your program should save each expression entered as an Expression Tree so that it can be evaluated, combined, transformed, or printed whenever needed without requiring you to re-parse the user's input. The name arithmetica was suggested by former Duke Professor Michael Littman as a pun on the popular mathematics interpreter Mathematica.

Each time the user enters a complete expression by pressing return, it should be evaluated and the value displayed for the user, then the user should be prompted to enter another. This is the so-called read-eval-print loop used in Scheme, Lisp, Smalltalk and other interpreted programming languages. The user can signal the end of their session by inputting a only period, ".", on the line by itself. The interpreter should ignore spaces, lines of input that are empty, i.e., contain only zero or more spaces, and lines that begin with the comment character "#".

An example session might look like the following (the user's input is in bold):

Welcome to Arithmetica
1-> 3 + 7 * 2
17
2-> ( 3 + 7 ) * 2
20
3-> a = 2 * 4 * 2
16
4-> b = 16
16
5-> c = a - b
0
6-> b = 2 * 16
32
7-> c
-16
8-> a = ~ b
-32
9-> c
-64
10-> z
0
11-> .

Expressions

Your program should recognize the following expressions:

expression	*syntax*	semantics	*examples*
Constant	`<any integer>` `[0-9]+`	a non-negative integer constant	`87` `113` `0`
Variable	`<any string>` `[a-zA-z_]+`	an expression represented by a word that is not otherwise used as an operator or function	`a` `bugs` `total_squished`
Assignment	`var = expr`	assigns an expression to a variable	`a = 87` `bugs = 12 * a` `a = b`
Unary Operator	`op expr`	prefixes an expression ~ // negation	`~ 13` `~ ( a + b * c )` `~ bugs`
Binary Operator	`expr op expr`	combines two expressions into a single expression (in precedence order) % // remainder * // times / // divide + // plus - // minus	`a + b` `a % 2` `a + 10 * c`
Unary Functions	`fun(expr)`	a function that takes an expression as its single argument abs // absolute value rand // random value between [0 .. num)	`abs ( a * b )` `rand ( x ) - y * 2`
Multi-Argument Functions	`fun(expr,...)`	a function that takes one or more expressions as arguments (each separated by a comma) sum max min mean	`sum ( a , 1 , b + 1 ) max ( 1 , 2 , 3 , 4 , 5 )`
Parentheses	`( expr )`	raises an expression's precedence	`( a + b ) * 3` `abs( 3 * ( bugs + t ) )`

Operators have the following precedence from left to right (listed from highest to lowest):

`()`	parentheses
`~`	unary operators
`^`	arithmetic operators
`*`, `/`, `%`	arithmetic operators
`+`, `-`	arithmetic operators
`=`	assignment

Extra credit will be given for implementing functions (either unary or multi-argument). Extra credit will also be given for allowing users to access the history of the expressions that have been typed in. For example, the standard CSH UNIX shell history substitution syntax (i.e., !12 refers to the twelfth expression entered, or !-1 refers to the previous expression entered).

Design

An incomplete Java solution is provided that correctly implements binary operators between constant values. It also attempts to implement parentheses, but there is one bug in that part of the code. This solution uses a standard algorithm to convert an infix expression into a postfix expression, which is then easy to convert into an Expression Tree.

To find the bug in the current solution, I suggest you write a separate program to test it that will later become a full set of tests that verify your final version works correctly. Tests can be in the form of input data files (some examples are provided), unit tests, or a separate driver program that verify every line of code written works as intended.

After fixing the bug, I suggest you refactor the code such that the program's main structures (i.e., trees, parsing, evaluating) are a closed core that is open to extensions (i.e., different types, different operations, different tree actions) without being modified. In order to do this, you will need to find the programs key abstractions (i.e., single common interfaces that can be customized through parameterization or subclassing to combine several concrete parts of your code), and build your core only from these abstractions. To proceed, I strongly suggest you build in small steps, practice refactoring as you go, and add features one at a time focusing on how to simplify the code necessary to implement each instead of simply adding more and more special cases to the core to support each new idea.

Grading Criteria

Your grade will be determined primarily by how easy it is to add new concepts to the algebraic language you can evaluate (i.e., that it is possible to add new tokens by adding O(1) line to your existing code to include the new functionality).

Note that the amount of extra credit will be in proportion to how clearly it follows the goals of your design. Contorting your main design to add in extra functionality or adding poor code that diminishes the rest of your refactoring efforts will not be rewarded. Thus, a well-tested, perfectly working program that has fewer features (but plenty of clear paths to easy expansion) is always worth more than the leaky kitchen sink. In short, to maximize your grade, you should implement enough variety in your program to clearly demonstrate that your design supports further such extensions.