Expression Trees are trees in which the terminal nodes (leaves) represent variables or constants in an expression and the non-terminal (internal) nodes represent the operators. The inherent hierarchical nature of trees precisely capture operator precedence and provide a convenient mechanism for storing, traversing, and transforming expressions. Note, that unlike binary trees, each node of an Expression Trees can have any number of children, easily modeling operators with one, two, or even arbitrary numbers of arguments.
Expression Trees were first introduced in 1967 with the programming language LISP, in which the program itself is stored as an expression tree, called an s-expression, that can even be modifed while the program is running! Since then, expression trees have formed the heart of many applications: representing genes, regular expressions, language grammars, and queries. They are so widespread, that the World Wide Web Consortium (W3C) used them as the core of the XML Document Object Model (DOM) specification. This structure is so natural that it was clearly the key to define a standard as (the W3C said):
a platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of documents. |
Thus, the tricky part about using expression trees is creating them by parsing a given formatted input string. Parsing is the process of taking string organized by a grammatical structure (that defines key characters to act as separators between the characters that are important, e.g., whitespace or semi-colon characters in C++) and producing a list of tokens in an order that is easy to convert into an expression tree. During the parsing process, the important sequences of characters are converted into token objects, while the separators are typically thrown out. If the input string contains grammatical errors such that it cannot be parsed, then an error is reported.
Write an arithmetic interpreter that allows users to enter, change, combine, and evaluate arithmetic expressions interactively. A user of your program should be able to enter arithmetic expressions interactively while your program keeps track and evaluates them. The user can enter expressions that conform to the types described below. Your program should save each expression entered as an Expression Tree so that it can be evaluated, combined, transformed, or printed whenever needed without requiring you to re-parse the user's input.
A user can explicitly save an expression to a named variable. Such variables serve the program's memory and can be used to refer to the saved expression later in other expressions. If the user refers to a variable name that has not previously been given a value, you should create the variable and give it the default value of the type represented by the expression.
Unless the user explicitly saves an expression to such a variable or simply evaluates a variable, the typed expression is saved as the current expression. If the entered expression is incomplete, it is combined with the current expression and the result is stored again as the current expression. Otherwise, it replaces the current expression. Additionally, the user can refer to the current expression explicitly using the special name "ans
".
In these ways (using variables and the saved current expression), the user can build complex expressions interactively. The user should be able to save their work (either a specific variable of the entire session). This action should create two files, each starting with the name given by the user but with different extensions. The first should end with the extension ".exp" and contain the printed expression the user is saving. The second should end with an extension appropriate to the type of the value created that records the value of the expression. When saving the entire session, the first file should contain all the variables created during the session and the second should contain the value of the current expression. Users should be able to load these saved .exp files back into your interpreter. When reading expressions from a file, your should ignore spaces, lines of input that are empty, i.e., only zero or more spaces, and lines that begin with the comment character "#".
Each time the user enters a complete expression by pressing return, it should be evaluated and the value displayed for the user, then the user should be prompted to enter another. This is the so-called read-eval-print loop used in Scheme, Lisp, Smalltalk and other interpreted programming languages. The user can signal the end of their session by inputting a only period, ".", on the line by itself.
Your team should implement at least three of the applications listed below (the first two plus one additional one); however, your design should allow for any number of such applications to be created using a single common base Expression Tree type.
Each application can be a separate program, with its own main function, but extra credit will be given if you can combine them all into a single program that allows the user to choose which kind of interpreter to run dynamically. Note, each application operates on a different type of data, may support different operators, and different access to the Expression Tree.
This project has the potential to get very complex very quickly. If you are not careful how you build it, you will have a very frustrating three weeks. The key is to view the program's main structures (i.e., trees, parsing, evaluating) as a closed core that is open to extensions (i.e., different types, different operations, different tree actions) without modification. In order to do this, you will need to find the programs key abstractions (i.e., single common interfaces that can be customized through parameterization, subclassing, or templating to combine several concrete parts of your code), and build your core only from these abstractions. To proceed, I strongly suggest you build in small steps, practice refactoring as you go, and add (at least the first two) applications one at a time focusing on how to simplify the code necessary to implement each new application instead of simply adding more and more special cases to the core to support each new idea.
These applications vary in the type of data they manipulate (e.g., booleans or images instead of integers), the operations they perform on that data (e.g., logical or matrix based arithmetic instead of built-in C++ functions), and finally, they require more sophisticated access to the expression tree than simply print in-order and evaluate. These applications are intended to emphasize specific design issues and are meant to help you think about what it means to "hard-code" things within your program and how to build a generic data structure that can be effectively reused. So your grade will be based on both the functionality of the applications and how effectively you share code between them (i.e., the more new code one must write to make a new application (especially cut-n-paste code!) the lower your final score will be).
The first application is restricted to integer values and so is the most similar to the code we have discussed in class. So when writing that you can focus primarily on creating the parser, memory system, and interpreter --- simply figuring out they will work together. Even so, it is suggested you start small and build up the functionality of you program. In particular, you should not implement every variety of operator before you have tested even one of them.
The second application works with image values and should stretch your notion of what type of value your expression tree stores, how its value is displayed and saved, and what kind of operators you can support. If your expression tree is reasonably constructed, the it should be easy to template it (after you have verified it works for integers).
The fourth application forces you to make a choice because you will need to make a differentiation function that is specific to each different kind of Expression --- so, no matter what, you will end up writing a number of small, simple functions; one for each type of expression. At first, it might seem easy to add a differentiate function to the Expression class that is specialized by every subclass associated with Expression. This has the advantage that everything associated with expressions is all in one class and would probably be fine if this were the only application you were building based on an Expression Tree.
Finally, the last application, you need to be able to satisfy an Expression in addition to evaluating it. This requires you to add at least one method for each kind of expression. Rather than simply adding a new method for each function, you should start to find an way to generalize this action so that you can close off the tree to this kind of future modification. You could make an iterator that traverses the expression for you and performs some generic action, but then you need to have a way to determine what type of object you are currently pointing at and what type of action you want to perform. You could have a separate subclass hierarchy that mirrors the main Expression subclass hierarchy but knows specifically how to differentiate of satisfy; but then you need to maintain two very similar inheritance hierarchies. You may not get it exactly correct the first time, but after several different applications you will have a much better handle on it. Either solution has good and bad consequences, you must figure out how to maximize the good and minimize the bad and justify your decisions.
You may use as little or as much of the exptree example code and imgtree example code discussed in class. Unlike these examples, your program requires you to convert the user's entered text from a string into an expression tree. Since the user enters each expression in infix notation (convenient for humans), you will need to convert it to postfix notation (convenient for computers) to build your tree. To convert the infix expressions entered by the user into an expression tree, we suggest you use Dijkstra's infix to postfix algorithm discussed in Weiss, Algorithms, Data Structures, and Problem Solving, page 362 or Main and Savitch, Data Structures and Other Objects, page 332. Additionally, there is an online resource here in the form of a past CompSci lab. Keep in mind that this part of the program also has to know about kinds of expressions (to recognize and create them). It should also be designed to be open to adding new kinds of expressions.
During parsing, you should use the standard functions for checking the types of characters (i.e., isspace, isdigit, etc.) where possible. Your code will be viewed very suspiciously if you resort to comparing ASCII codes or even checking ranges of character values directly.
You should not need to invent any of the data structures for this program. The C++ STL contains a variety of "containers" such as stack, vector, list, and map. For example, you may want to represent the user's variables using an STL map of strings to expression trees.
There are many extensions to the basic specifications possible; some are listed below (note, you can add more operators to the set listed above, but that will not be worth as much extra credit). From the stand point of your grade, the most important thing is that your program is designed well (i.e., that it is possible to add new kinds of expressions simply by creating new subclasses and adding O(1) line to your existing code to include the new classes). This means your design should be open to adding new kinds of expressions while closed to changing the evaluation and parsing code. The requirements above and suggestions below are intended to help you to realize such a design.
Next in importance to your grade, your project should be thoroughly tested to prove to the course staff that your confidence in it is justified. You should include whatever data files, driver programs, or shell scripts (as well as documentation on how to use them) you have used in your submission. If you do all of the above well, the maximum grade you can receive is an A-.
Finally, the extensions given below are intended to stretch your design further and to differentiate your program from others in order to capture the global OOLALA market, you must do at least two such extensions if you want to be considered for a grade in the A range. These extensions must further the good design of your program and not simply be hacks of code added at the last minute. If you do not have time to implement an extension, partial extra credit may be given for excellent justification of how your design either supports adding such a feature already or how it would need to changed sufficiently to support such a feature.
In addition to those described in each application description, some suggested global extensions follow (listed roughly in order from easiest to hardest):
We will check this page frequently to check on your progress, so you will need to update this web site as you develop your project