CompSci 108 : Fall 2011

Design Guidelines

Good design is not simply following a formula. Although people will often make lists of "rules" for good design, the reality is that good designers will occasionally break the rules. In this class, we will be asking you to design code that has three basic characteristics:

Your code should be easy to understand
Your code should have objects with simple useful interfaces
Your code should avoid duplication

Within this basic structure, we will have some "rules". But these rules won't cover every possible instance, and sometimes you might have good reason to break a rule to accomplish these goals. But we hope that these rules will help you get started making good designs. If you feel you must violate one of the conventions, you must document the reason within your code.

Your code should be easy to understand

Your code should ideally express what it is doing. Let me give you an example. Imagine you are digging into some code that handles adding new products to a company's website, trying to fix a bug where products are not getting added. You run across a function that looks like this:

public void addProducts(List<Product> productsToAdd)
{
    for(Product product : productsToAdd)
    {
         if(isProductADuplicate(product))
             continue;
         registerInPriceDB(product);
         copyImageToWebserverCache(product, NEW_PRODUCTS_DIRECTORY);
    }
}

Without a single comment, you can infer from this code that there are 3 steps before a particular product is added. This is the ideal we're looking for: your code should make obvious what is going on at every step.

Here are some general guidelines for making your code easy to understand:

code should follow an internally consistent set of coding conventions with regard to
- indentation, using Spaces Only, NOT Tabs
- brace placement
- capitalization
- instance variable and constructor placement (e.g., at the top of the class)
- naming conventions (e.g., all instance variables start with the prefix "my")

the conventions here

carefully choose the names within your program since they are critcial to understanding your code

use meaningful, non-abbreviated, names for classes, methods, and variables that reflect their purpose rather than their type
For example registerInPriceDB(Product p) is a much better name for a method than regPDBForProductObj(Product p).

use methods with meaningful names rather than inline code that needs to be commented
This is a common mistake. Imagine instead of the code above we had started something like this:

                    public void addProducts(List<Product> productsToAdd)
                    {                      
                      for(Product product : productsToAdd)
                      {
                           //first check for a duplicate product
                           ProductDataParser dp = new ProductDataParser(product.parserFormat());
                           try {
                              dp.parse(product);
                           } catch (ParserException e)
                           {
                              Logger.logParseException(DUPLICATE_CHECK_STEP, e.toString);
                              return;
                           }

                           int productID = dp.productID;
                           String dataFile = readProductDataFileAsString();
                           Pattern p = generateDataFileSearchRegex(productID, datafile);
                           if (p.find())
                                continue;
                           // ok now register the product in the price DB

See how much harder that is to read? Split your code into small methods and it is much clearer what is going on. This point is covered in more (and better detail) in "Long Method" from Code Smells.

use constants rather than literal values for all values used multiple times or in program logic. For example use NEW_PRODUCTS_DIRECTORY rather than "../image_files/np/"

variables should be declared as close to where they are used as possible
- local variables should be declared when they can be initialized
- private methods should use parameters and return values to communicate information rather than instance variables
- super-classes should not contain instance variables specific to only some sub-classes
- use global variables sparingly
your code should contain no warnings from the compiler

Your code should have objects with simple useful interfaces

The idea behind objects is that they encapsulate data and behavior together. The whole idea is that by keeping the data with the code that knows how to use that data, we gain flexibility. Without objects, knowledge about your data tends to get spread throughout the code. So for example, in C without objects, it's usual to see documentation like this.

/* * This function multiplies two matrixes. * Be sure that each matrix has been initialized with matrix_init * (or matrix_init_sparse for spare matrixes). If caching is enabled, * a.cache should be set to 1. Note that if a out of memory error * occurs, the function may return an erroneous result. In that case, * you can note the problem by checking the matrix_result_error global * for the code OUT_OF_MEMORY. */ struct matrix do_matrix_multiply(struct matrix a, struct matrix b);

The problem with this code is that the user of the matrix library has to know and use all sorts of bizarre details about the library. Like for example, the user has to know to call matrix_init before multiplying. The user also needs to know that each matrix has variable called cache that needs to be set appropriately.

Our goal is to have objects that hide unnecessary details. So we build a Matrix class. We don't require the user to know how the matrix class determines if caching is enabled or not. We want everything the user does with the matrix to be safe and as simple as we can manage. Because the code for (say) doing the multiplication is bound up with the details of the matrix structure, we can ensure that our code does the right thing without the programmer needing to know the details.

Having a lot of stuff in the public interface to a class makes hiding the details impossible. So a Java class like this has problems:


class Matrix {

    //some normal stuff

    public boolean cache;
    public int doInternalOnlyInitialization(int sparsenessKeyValue);

}

This stuff is probably something that a user of the Matrix object doesn't want to have to care about. Not only is it annoying, it makes the code less flexible. Now, if you decide you want the matrixes to have 3 different kinds of caching, you've got this extra boolean hanging around that is unnecessary.

What this means is that designing an object system involves carefully thinking about what needs to care about what, and defining simple interfaces between those parts. Usually, if you try to keep your code easy to understand, avoid duplication, and keep a few simple rules in mind you should be all right.

Avoid having just one giant class in a large system. Classes should work together - there shouldn't be a "god class" that manages everything. Check out "Large Class" in Code Smells for a discussion of some of the evils of large classes and ideas for fixes.
Encapsulate implementation decisions by focusing on the behavior of your class first
- instance variables should be only private or protected (with private being much preferred)
  - protected instance variables should only be used when you have subclasses that need them, and where protected methods won't work for some reason
  - no instance variables should be public (get/set methods should also include comments that justify their existence)
Classes should be responsible for their own data, and general tell rather than ask other objects. Be suspicious of methods in one object that call a lot of methods in a different object ("feature envy" or "inappropiate intimacy" Code Smells). For an expansion of this check out the "tell the other guy" section of OO in One Sentence.
Avoid message chains (Code Smells) or train wreck coding (OO in One Sentence)
Avoid temporary fields (Code Smells) -- instance variables that are set under only certian circumstances
Classes should minimize their number of get/set methods
- get methods should be aware that data they return from getters could be modified, and so they might want to duplicate rather than return internal values
- set methods should try to validate the data they receive (e.g., protect against null)
Classes need to provide some behavior within the program (i.e., they cannot simply consist of get/set methods). Usually just having getter and setter methods is a sign of a problematic design (see "Data class" in Code Smells).
Declared types should be as general as possible (i.e., prefer Iterator to Collection to List to ArrayList)

Your code should avoid duplication

Duplicated code is bad news. If hear more details about why, check out the DRY priciple in OO in One Sentence or the "Duplicated Code" code in Code Smells. Ideally, you only want logic in one place in the entire system. If you've got logic duplicated, it means a potential problem later on when you change one place but not the other.

Sometimes avoiding duplication is a simple as writing a method. Say you find yourself doing this a bunch:

FileWriter fstream = new FileWriter("logfile.txt"); BufferedWriter out = new BufferedWriter(fstream); out.write("LOG: Some message, but its different every time"); out.close();

Obviously, pull that out into a method.

Sometimes it's a little trickier. Imagine you have code in a couple of places that look like this:

public void doPrint(Node node) { if(node.type == "if_node") //do something for if nodes else if(type.type == "constant_node") //do something for constant nodes else //it must be an error node so doing something for that }

This is called a conditional chain. Sometimes it's also implemented as a case statement. Often, this particular kind of duplication can be removed by creating subclasses for each kind of type (node, in this example) and using polymorphism to do the right thing.

There are many other ways code can be duplicated. Often, there are ways to avoid this duplication either through some fancy object-oriented programming, or using a language feature like reflection, or something else. We'll be talking about specific techniques in class, but if you see some duplication in your code that's difficult to remove it is definitely something to talk to a TA or instructor about.

Do not repeat yourself by copying and pasting code, either exactly or with changes
Conditional chains or switch statements that distinguish between a few different kinds of behavior should be replaced by polymorphism (see "Switch Statements" Code Smells)
Use Java API methods and classes rather than write your own implementation whenever possible

Everything Summarized

If you want a reference sheet to look at as you go through the code, check out this summary. This is also what your TA will look at as he or she look through your code for signoff.