CompSci 108
Fall 2010
The Software Studio

Expression Trees are trees in which the terminal nodes (leaves) represent variables or constants in an expression and the non-terminal (internal) nodes represent the operators or functions. The inherent hierarchical nature of trees precisely capture operator precedence and provide a convenient mechanism for storing, traversing, and transforming expressions. Note, that unlike binary trees, each node of an Expression Trees can have any number of children, easily modeling operators with one, two, or even arbitrary numbers of arguments.

Expression Trees were first introduced in 1967 with the programming language LISP, in which the program itself is stored as an expression tree, called an s-expression, that can be modified even while the program is running! Since then, expression trees have formed the heart of many applications: representing genes, regular expressions, language grammars, and queries. They are so widespread, that the World Wide Web Consortium (W3C) decided to use them as the core of the XML Document Object Model (DOM) specification, more than forty years after their invention.

The trickiest part about using expression trees is actually creating them from a given formatted input string. Parsing is the process of taking string organized by a grammatical structure (that defines key characters to act as separators between the characters that are important, e.g., whitespace or semi-colon characters in Java) and producing a list of tokens in an order that is easy to convert into an expression tree. During the parsing process, the important sequences of characters are converted into token objects, while the separators are typically thrown out. Use Dijkstra's Shunting-Yard algorithm to convert an infix expression into a postfix expression, which is then easy to convert into an Expression Tree. If the input string contains grammatical errors such that it cannot be parsed, then a syntax error is reported.

Specification

Write an interpreter that allows users to interactively enter any number of arithmetic expressions and see the result for each one (this is the so-called read-eval-print-loop, or REPL, used in Lisp, Smalltalk, Python and other interpreted programming languages). Each expression entered should be transformed into an Expression Tree so that it can be evaluated, combined, transformed, or printed within the program without requiring the original input string to be re-parsed. However, instead of simply building and evaluating expressions of numeric values such as that provided by Google, you should allow the user to create expressions that evaluate to colors and then eventually to images. This may sound a little strange, but the results can be quite spectacular. Here is the general idea: an image is really a expression that maps points (x,y) to colors (r,g,b). This program simply evaluates that expression for each pixel (x-, y-point) in the image and stores resulting color.

For this program, a color represents three real numbers, one each for the red, green and blue component, where the component values range from -1 to 1. For example, [-1,-1,-1] is black, [1,-1,-1] is red, and [1,1,-1] is yellow. During computation of an expression, the value of each component should not be restricted to this range, but the final result of the expression should be clamped to within the range -1 to 1. By default, like the colors, the domain of the image over X and Y is from -1 to 1. The upper left corner of the image will be (-1,-1) and the lower right corner will be (1, 1). Thus, the size of the image will need to be mapped to the domain (in this case, [-1, 1]). Before the expression is evaluated for a specific pixel, the variables x and y should be set to the current point to be evaluated.

For example, here are several images generated from basic expressions.

Your program should allow a user to input expressions interactively, or from a file, and display the resulting image. The syntax of the allowed expressions is given below:

expression syntax semantics examples
Constant
<any real number>
[-]?[0-1]+(.[0-9]+)?
a real valued number represents the color grey
note, to avoid potential ambiguity in parsing there should not be a space between the negative sign and the value
.87
1
-0.4
Color
[ constant,
constant,
constant ]
an RGB color where each value can be any constant
note, in this case, each constant only represents one specific color component
[1, -1, 0.25]
[0.5, 1, 1]

Variable
<any alpha-numeric string>
[a-zA-z]+
an expression represented by a word
note, two variables x and y should always be defined to be the current coordinate in the image domain
bugs
q45
Unary Operator
op expr
prefixes an expression 
!   // negate (i.e., invert) a color
!bugs
!(t + [0, 1, -1 ] * 0.1)
Binary Operator
expr op expr
combines two expressions into a single expression (in precedence order)
*   // times
/   // divide
+   // plus
-   // minus
% // mod ^ // exponentiate
a + b
a / 2
a + 10 * c
Functions
fun(expr, expr)
a function that takes zero, one, or two expressions as its argument(s)
note, not all of functions are defined continuously, so take appropriate action (i.e., divide-by-zero might silently return zero)
if function is scalar, i.e., typically operates on a single value like sin(x), it should be applied to each color component in turn
random()                // random color
floor(expr) // round down ceil(expr) // round up abs(expr) // absolute value clamp(expr) // clamp results to [-1, 1] wrap(expr) // wrap results around [-1, 1] (i.e., 1.5 -> -0.5) sin(expr) // sine cos(expr) // cosine tan(expr) // tangent atan(expr) // arc tangent log(expr) // log rgbToYCrCb(expr) // convert color to luminance / chrominance space yCrCbtoRGB(expr) // convert color to RGB from luminance / chrominance space perlinColor(expr, expr) // create random color based on two given values perlinBW(expr, expr) // create grey scale color based on two given values
random()
sin(a * b)
abs(x) - y / 2
perlinColor(x, y)
perlinBW(y, x+x)
Parentheses
(expr)
raises an expression's precedence
(a + b) * 3

Refactoring

A Java solution is provided as a starting point that correctly implements binary operators between constant values.

Refactor the code such that the program's main structures (i.e., trees, parsing, evaluating) are a closed core that is open to extensions (i.e., different syntax and semantics) without being modified. In order to do this, find the program's key abstractions and build your core only from these abstractions. To proceed, I strongly suggest you build in small steps, practice testing and refactoring as you go, and add features one at a time focusing on how to simplify the current code when implementing new features instead of simply adding more special cases to support each new idea.

Testing

The given code also attempts to implement parentheses, but there are bugs in that part of the code. Additionally, you should attempt to bullet-proof the code from bad input from the user. Several cases are already handled, but not all cases. For each case where the user can provide bad input, you should give back specific and constructive feedback to help the user understand and fix the problem.

To find the bug in the current solution and to ensure your final solution works correctly, I suggest you write a separate program to thoroughly test it then update those tests to verify that both new functionality and refactored code work. Tests should be in the form of unit tests, with input data files to make it easy to add, change, and organize tests. Your "test driver" program should attempt to verify every line of code written works as intended.

Resources

Deliverables