Clifton Kerr
CPS108
OOWS 2.0

From OOWS 1.0

OOWS 1.0 should have all of the desired features fully implimented, though they are not currently as optimized or well-packaged as they could be.  This includes command-line argument support.  Files can be entered using -f filename or --file filename.  Likewise, number of output columns and number of elements to output can be set with -c (--columns) and -n (--number) respectively.  The program itself works by creating and managing a map of strings to ints, which is incrimented to keep track of the number of times a given word appears in the input.  Rather than dumping all of the string,int pairs into a vector and sorting them once they are all loaded, however, I used a priority queue set up to only keep track of the sorted number of elements needed for output.  The queue is sorted backwards to the conventional notion; the smallest object is kept on the top of the queue rather than the largest, so that it is easy to remove extras and keep the queue as small as possible.  The contents of the queue are then placed into a vector matrix for printing.

OOWS 1.0 is currently not as fast as it could be, taking a full 30+ seconds to parse and process the king james bible on my computer.  I'm going to have to look into optimizations, particularly in the data mapping, in order to shave a significant time off of the run time.  I'll also be doing a good bit of refactoring to better make use of the STL.

Known bugs:
Most abnormal input states (bad filename, nonpositive column or return count) have been accounted for and will generate error output, but the possibility exists that a bizarre combination of garbage inputs could get past them.  I am assuming that no one would want more than INT_MAX words printed or that they would want them in more columns, but the atoi function call caps input there if they would.  I also don't know if there are anywhere close to 2 billion words in the english language, so I don't think it's a problem.

As for runtime, storage is n log n for all words, sort is n log m (m is number of items to be printed), and print is order m.  A major bottleneck comes in with the input, which will be the first thing I look at optimizing.

Changes in OOWS 2.0

The algorithm for the wordcount remains virtually unchanged, though it seems to run about 10% faster than version 1.0.  Classes have been expanded and templated to better interface with STL classes, and all of my global constants and comparitor structs have been consolidated into a single include file.

I didn't template the sorted datastructure for the simple reason that the priority queue structure intentionally does not use iterators.  I still could have templated it, but the only valid container type that the template would have worked on would be a priority queue.