CompSci 108 : Fall 2011

Design Solution: Filtered File Iterators
Mike Hewner

A Small Warning

I have written this example document to give you an idea of what a good design document looks like. But I know there is a tendency among students to attempt to copy designs from instructor samples ... so in this case I have deliberately chosen a solution that is not very common. So in case you are thinking that iterator might be the perfect thing for your design ... probably not.

Interpreting UML Diagrams

When interpreting the diagrams within this document, the one thing that might not be immediately obvious is that abstract classes have their names written in italics, and abstract methods within those classes are also written in italics. When a concrete class derives from an abstract class, it is assumed to implement those abstract methods, but they are not usually shown on the diagram.

Other than that, remember that a line with an empty triangle means a subclass, and a line with an empty diamond means "contains" usually as an instance variable.

So looking at the diagram above, you can infer that the class InternalTreeNode looks something like:

Before you make your own diagrams, look up a UML tutorial. We are not going to be sticklers, but if we misinterpret your diagram we might think your design is wrong.

Context of the Design

So as part of our file analysis project^[1], one of the things that often needs to be done is iterating across all the files/directories in a particular location. For example, in order to determine the size of all the subdirectories within a particular structure, you might (naively) write Java code that looks like this:

Similarly, if you want to construct a map with all directories and subdirectories mapping to the number of files within each, you would write code like this:

Looking at these code snippets, we can see that there is a lot of similarity between them. As part of a general framework for dealing with files, we want a general way to compute aggregate data on File structures, without having to rewrite the code each time.

An additional complication is the need for filtering. You can see in the countNonHiddenFiles exemplar above we want to ignore hidden files. Similarly, we might want our search to follow simlinks or not, or only to involve files with names that match a specific parameter. We want to have different kinds of filters, and have the decision about what kinds of filters to have be separate from the particular metric we are trying to compute.

Solution: Filtered File Iterators

The solution I propose is to separate out the act of iterating over the file structure into an Iterator object. Clients can instantiate the kind of iteration they want (i.e. depth first, breath first) and then simply call the hasNext and next operations to move through the directories and subdirectories in the way they selected. This is an implementation of the Iterator design pattern.

For filtering, clients can call methods like ignoreDirectories or filterHidden that determine what types of files are returned by hasNext. For custom filters, clients can implement the existing Java interface FileFilter and then add it to the FileIterator instance with addCustomFilter. Internally, "built in" filters like ignoreFiles and custom filters are both implemented by adding filters to the AggregateFileFilter.

Internally, the FileIterator contains a AggregateFileFilter instance. AggregateFileFilter simply maintains a list of FileFilters and only accepts if all its internal FileFilters accept (AND behavior, essentially). Although it was not necessary for our use case, it is simple to see how this could be extended to include OR and NOT behavior making the FileFilters capable of representing any boolean combinations of filters. Such an implementation would be a use of the Composite pattern.

With this new design implementing the two example file operations from the previous section becomes easier and removes the use of recursion.

In the case of countFiles, we were able to make the method more general. Rather than write a specific version for hidden files, we have changed the parameter to an FileIterator. Now the caller can add hidden files (or not) to the iterator, and the same function counts files of every sort.

Alternatives and Justifications

Alternative 1: FileComposite

The idea with this approach is to implement the various file operations we want within two different classes: FileLeaf and Directory. This allows us to elliminate this isDirectory() check within our functions. So for example, the implementation of Directory.getDirSize() would look like:

Pros

Simpler than proposed solution
If we already had implemented File/Directory objects, this is a trivial extension

Cons

Everytime time we add a new kind of computation, we have to modify three classes
We either have to reimplement the tree traversal code in every method of Directory, or use reflection or other fanciness. Plus does not lend itself to traverals like Breadth-First.
No easy way to implement filtering

Alternative 2: FileVisitor

So in this solution, each algorithm we want to run on the tree gets its own class. An instance of this class is passed to the FileTraversal class, which then "walks" it along the directory structure, at each node calling visitFile or visitDirectory as appropriate. So the implementation of DirSizeComputer would look like:

Pros

Easy to add new kinds of traversals (just a new method in FileTraverser or elsewhere)
Obvious what methods need to be added every time you implement a new directory computation
Same flexible filtering possible as with FilteredIterators

Cons

Have to make a new subclass for every kind of computation
A lot of classes

Justification

So of the alternatives, FileComposite just does not have a good way to implement either the complex filtering or different kinds of traversals. Given that we want to be able to compute a wide variety of computations with a variety of different filters (even perhaps filters selected at runtime by the user), we need that flexibility. It is simple, but it is too simple for this use case.

The main difference between FileIterator and FileVisitor is that with FileIterator new kinds of traversals require a subclass, but new kinds of computations only require a function (or even just part of another function if we wanted). With FileVisitor new kinds of traversals are easy but new kinds of computations require a subclass. Because I expect to want to write a lot more computations than traversals, I think that FileIterator makes the most sense.

[1]Made up. This is a sample remember? But your version should not be made up - you should address a real problem with your project.

Design Solution: Filtered File Iterators Mike Hewner