The Unix utility grep supports fast look-up of
regular expressions. Your project is an implementation of
Goofi, a Grep-like Object-oriented File Indexing system. However, Goofi
will support matching in a large set of files and will cache results
between runs. In this way it will function as a cross between
the Unix utilities find and fgrep (see man page
entries for these). It is similar, but much smaller than
the utility glimpse.
(If you check the web page you'll see that glimpse is available on many
kinds of systems. It is installed on CS machines, but not on acpub machines.)
General usage of goofi is described below. Flags in brackets [] are optional.
goofi source [-update] [-exclude subdir] [-source dir] [-ignore file] [-suffixes file] [-output dir] [-min int] [-depth int]All dash options, e.g., -update, -ignore, can come in any order and should be abbreviated by a single letter (shown in bold below). You can add other options for extra credit.
update | Only reads files that are newer than the last time goofi was run with the same root directory. Without this option all files in the directory hierarchy are read and indexed. With this option only new files (modified since the last run of goofi) are read and indexed. |
exclude subdir | The subdirectory named as an argument is not searched in this run of goofi. Optionally subdir can be a regular expression and any directory matching the regular expression is ignored. |
source dir | Instead of reading the .goofi file/directory in the user's home directory, the file/directory named as an argument to source is read. |
ignore file | The named file is assumed to be white-space delimited words. These words are ignored and not indexed by the goofi run. |
suffixes file | The named file consists of white-space delimited words/regular expressions. Any file encountered by goofi, whose suffix matches any of those in the file is not indexed during the goofi run. |
output dir | Store the results of the index in a file or directory in the user's home directory (default = ".goofi"). |
min int | Only words whose lengths are greater than or equal to the integer value argument int are indexed by the goofi run (default = 4). |
depth int | Only index to certain depth (default = 2 only when indexing web pages) |
The source is specified by either an absolute or relative directory path or an URL for a web page that should be recursively searched to create an index for subsequent usage by the goof program. This directory is called the goofi root directory The index can be one file or multiple files stored in a directory. The default location of the index file should be in either a file or a directory named .goofi in the user's home directory. By default goofi should ignore all executable files and those with the following suffixes:
.Z, .z, .zip, .tgz, .dvi, .ps, .tar, .oIn addition, the user should have the option of excluding certain files and/or directories from being indexed by storing the names of these files in a .goofi-excludes file which will be read, if it exists, by goofi.
Your indexing program should run in two modes: one that emphasizes the smallness of the index files built, and one that emphasizes the speed of goof queries. Most likely it will be difficult to have a small index and a fast program. For some ideas see the glimpse paper.
General usage of goof is described below. Flags in brackets [] are optional.
goof word [-source dir] [-n] [-file regexp] [-reg] [-context]All dash options, e.g., -source, -context, can come in any order and should be abbreviated by a single letter (shown in bold below). You can add other options for extra credit.
-source dir | Instead of reading the .goofi file/directory in the user's home directory, the file/directory named as an argument to source is read. |
-nolines | No line numbers are printed, only matching file names. |
-file regexp | Only files matching the regular expression regexp are searched/indexed for matches. For example, typing goof -file "\.cc$" string would find all occurrences of the word string in indexed files with a .cc suffix. |
-reg | The word argument to goof is treated as a regular expression instead of a word. Using goof -reg "^r...t$" would search for all five letter letter words starting with r and ending with t, e.g., robot. |
-context | In addition to printing line numbers, the matched lines are printed as well. |
By default, goof reads a .goofi file/directory in the user's home directory, searches for all occurrences of the word in the index, and prints a list of files and matching line numbers for each file on which the word occurs. Words are delimited by whitespace and punctuation characters. However, punctuation (see ispunct in ctype.h) is not considered part of a word when it comes before the first alphabetic character or after the last alphabetic character.