Write a program that will read a file and determine the words that occur most frequently in the file. Words are delimited by white space and should not have leading or trailing punctuation. All characters should be converted to lowercase equivalents. The program should print the n words that occur most frequently, where n defaults to 20, but is otherwise a parameter to the program. If no filename is specified, the program should read from standard input (cin). The program should also be able to output the words in multiple columns, where the default is one column, but otherwise could be specified as a parameter to the program. You should not use any tapestry collections in this program, but instead use C++'s STL.
The examples below show how the program can be used.
For example, the following are all valid uses.wordcount wordcount -f filename -n numwords -c columns wordcount --file[=filename] --numwords[=number] --columns[=number]
wordcount < data/poe.txt wordcount -n 30 < data/poe.txt wordcount --file=data/poe.txt --numwords=30 wordcount -f data/poe.txt -n 30
Each line of output should contain a count, followed by a two-spaces, followed by a word. The count is the number of times the word occurs in the input being processed. The most frequently occurring word should be printed first, the least frequently word last (of the maximum of n lines printed where n is a parameter to the program that defaults to 20.) The counts should be right-justified so that the words are aligned by the first letter
100 blueberry 12 apple 11 berry 11 cherry 7 watermelon 6 orange
The command line option --columns[=number] or -c number should output number count/word pairs on each line (except the last line which may not be full.) Each count is followed by two spaces and counts are right-justified in each column (as above). Each column is separated from the next column by four spaces (between the longest entry in one column and the largest number in the next column).
These could generate output as follows. Note that there are four spaces between the 'n' in watermelon and the first '1' in the count of 11 for cherry in the next column. There are four spaces between the 'y' in blueberry and the '1' in the 11 count for berry.wordlines --file=foo.txt --columns=3 wordlines -f foo.txt -c 3
100 blueberry 11 berry 7 watermelon 12 apple 11 cheery 6 orange
The following programs demonstrate how to use the getopt function to parse command-line arguments.