Design Solution: Filtered File Iterators
Mike Hewner
A Small Warning
I have written this example document to give you an idea of what a good design document looks like. But I know there is a tendency among students to attempt to copy designs from instructor samples ... so in this case I have deliberately chosen a solution that is not very common. So in case you are thinking that iterator might be the perfect thing for your design ... probably not.
Interpreting UML Diagrams
When interpreting the diagrams within this document, the one thing that might not be immediately obvious is that abstract classes have their names written in italics, and abstract methods within those classes are also written in italics. When a concrete class derives from an abstract class, it is assumed to implement those abstract methods, but they are not usually shown on the diagram.
Other than that, remember that a line with an empty triangle means a subclass, and a line with an empty diamond means "contains" usually as an instance variable.

So looking at the diagram above, you can infer that the class InternalTreeNode
looks something like:
class InternalTreeNode extends AbstractBinaryTreeNode { public boolean isLeaf() { return false; } public int computeHeight() { /* do height computation */ } // the 0..2 tells us there are either 0 1 or 2 // AbstractBinaryTreeNodes in each InternalTreeNode private List children; }
Before you make your own diagrams, look up a UML tutorial. We are not going to be sticklers, but if we misinterpret your diagram we might think your design is wrong.
Context of the Design
So as part of our file analysis project[1], one of the things that often needs to be done is iterating across all the files/directories in a particular location. For example, in order to determine the size of all the subdirectories within a particular structure, you might (naively) write Java code that looks like this:
public static long getDirSize (File startDir) { long sum = 0; System.out.println("Checking " + startDir); for (File curr : startDir.listFiles()) { if (curr.isDirectory()) { sum += getDirSize(curr); } else { sum += curr.length(); } } return sum; }
Similarly, if you want to construct a map with all directories and subdirectories mapping to the number of files within each, you would write code like this:
public static void countNonHiddenFiles (File startDir, Map dirToNumFiles) throws IOException { for (File curr : startDir.listFiles()) { if (curr.isDirectory()) { dirToNumFiles.put(curr.getCanonicalFile(), 0); countNonHiddenFiles(curr, dirToNumFiles); } else { if (!curr.isHidden()) dirToNumFiles.put(curr.getCanonicalFile().getParentFile(), dirToNumFiles.get(curr.getCanonicalFile().getParentFile()) + 1); } } }
Looking at these code snippets, we can see that there is a lot of similarity between them. As part of a general framework for dealing with files, we want a general way to compute aggregate data on File
structures, without having to rewrite the code each time.
An additional complication is the need for filtering. You can see in the countNonHiddenFiles
exemplar above we want to ignore hidden files. Similarly, we might want our search to follow simlinks or not, or only to involve files with names that match a specific parameter. We want to have different kinds of filters, and have the decision about what kinds of filters to have be separate from the particular metric we are trying to compute.
Solution: Filtered File Iterators

The solution I propose is to separate out the act of iterating over the file structure into an Iterator
object. Clients can instantiate the kind of iteration they want (i.e. depth first, breath first) and then simply call the hasNext
and next
operations to move through the directories and subdirectories in the way they selected. This is an implementation of the Iterator design pattern.
For filtering, clients can call methods like ignoreDirectories
or filterHidden
that determine what types of files are returned by hasNext
. For custom filters, clients can implement the existing Java interface FileFilter
and then add it to the FileIterator
instance with addCustomFilter. Internally, "built in" filters like ignoreFiles
and custom filters are both implemented by adding filters to the AggregateFileFilter
.
Internally, the FileIterator
contains a AggregateFileFilter
instance. AggregateFileFilter
simply maintains a list of FileFilter
s and only accepts if all its internal FileFilter
s accept (AND behavior, essentially). Although it was not necessary for our use case, it is simple to see how this could be extended to include OR and NOT behavior making the FileFilter
s capable of representing any boolean combinations of filters. Such an implementation would be a use of the Composite pattern.
With this new design implementing the two example file operations from the previous section becomes easier and removes the use of recursion.
public static long getDirSize (File startDir) { DepthFirstIterator i = new DepthFirstIterator(startDir); i.ignoreDirectories(); sum = 0; while (i.hasNext()) sum += i.next().length(); return sum; } public static void countFiles (FileIterator i, Map dirToNumFiles) throws IOException { while (i.hasNext()) { File curr = i.next().getCanonicalFile(); if (curr.isDirectory()) dirToNumFiles.put(curr, 0); else dirToNumFiles.put(curr, dirToNumFiles.get(curr)); } }
In the case of countFiles
, we were able to make the method more general. Rather than write a specific version for hidden files, we have changed the parameter to an FileIterator
. Now the caller can add hidden files (or not) to the iterator, and the same function counts files of every sort.
Alternatives and Justifications
Alternative 1: FileComposite

The idea with this approach is to implement the various file operations we want within two different classes: FileLeaf
and Directory
. This allows us to elliminate this isDirectory()
check within our functions. So for example, the implementation of Directory.getDirSize()
would look like:
public static long getDirSize (File startDir) { long sum = 0; for (FileComposite f : myChildren) sum += f.getDirSize(); return sum; }
Pros
- Simpler than proposed solution
- If we already had implemented
File
/Directory
objects, this is a trivial extension
Cons
- Everytime time we add a new kind of computation, we have to modify three classes
- We either have to reimplement the tree traversal code in every method of
Directory
, or use reflection or other fanciness. Plus does not lend itself to traverals like Breadth-First. - No easy way to implement filtering
Alternative 2: FileVisitor
So in this solution, each algorithm we want to run on the tree gets its own class. An instance of this class is passed to the FileTraversal
class, which then "walks" it along the directory structure, at each node calling visitFile
or visitDirectory
as appropriate. So the implementation of DirSizeComputer
would look like:
public class DirSizeComputer { private long mySum = 0; public visitDirectory (File f) { /* do nothing */ } public visitFile (File f) { mySum += f.length(); } public long getResult () { return mySum; } } //elsewhere where we want to actually compute size FileTraverser f = new FileTraverser(startingDir); DirSizeComputer computer = new DirSizeComputer(); f.traverseDepthFirst(computer); System.out.println("Dir size is " + computer.getResult());
Pros
- Easy to add new kinds of traversals (just a new method in
FileTraverser
or elsewhere) - Obvious what methods need to be added every time you implement a new directory computation
- Same flexible filtering possible as with
FilteredIterator
s
Cons
- Have to make a new subclass for every kind of computation
- A lot of classes
Justification
So of the alternatives, FileComposite
just does not have a good way to implement either the complex filtering or different kinds of traversals. Given that we want to be able to compute a wide variety of computations with a variety of different filters (even perhaps filters selected at runtime by the user), we need that flexibility. It is simple, but it is too simple for this use case.
The main difference between FileIterator
and FileVisitor
is that with FileIterator
new kinds of traversals require a subclass, but new kinds of computations only require a function (or even just part of another function if we wanted). With FileVisitor
new kinds of traversals are easy but new kinds of computations require a subclass. Because I expect to want to write a lot more computations than traversals, I think that FileIterator
makes the most sense.
[1]Made up. This is a sample remember? But your version should not be made up - you should address a real problem with your project.