CompSci 307 : Spring 2019

Data : Complete Implementation

The goal of this part is to complete the project with the general design goals in this design checklist as well as the specific ones described in the assignment. Additionally, your project Gitlab repository should show many, purposeful, commits rather than just one or two large "kitchen sink" commits.

Specification

In addition to the questions from the basic part, your final version should answer the following questions:

Given a name, a gender, and a year, report the name/gender pair that has the same rank in the most recent year in your data set as the given name/gender pair had in the given year. This is the algorithm used for this website.
Given a name, a gender, and a range of years, report the average rank of that name/gender pair so the data can be used to show how popular a name has been over that timespan.
Given a name and a range of years, report the average rank of that name (regardless of gender) so the data can be used to show how popular a name has been over that timespan.
Given a name and a number of years, report the average rank of that name (regardless of gender) for the most recent number of years so the data can be used to show how popular a name has been over that timespan.
Given a range of years, report what single name was ranked the most often as the year’s most popular name (regardless of gender) within the range, along with how many years this name ranked as the most popular name.
Given a range of years, report what gender was ranked the most often as the year’s most popular gender within the range, along with how many years this gender ranked as the most popular name.
Given a range of years, report the most popular letter that girls’ names have started with within the range, along with an alphabetized list of all names that start with this letter.
Given a range of years, report the most popular letter that any name has started with within the range, along with an alphabetized list of all names that start with this letter.
Extra Credit: Given a range of years, report the most common name prefix for boys’ names within the range, along with an alphabetized list of all names with that prefix, in order to elucidate the most popular boy’s name in recent history when taking derived name variations into account. For this requirement, one name (e.g., “Franklin” or “Glenn”) is considered a derived variant of another (e.g., “Frank” or “Glen”) if the name’s spelling contains all the letters of the other name at the start of the name.
Extra Credit: Given a range of years, report what single name was ranked the most often as the year’s most popular name (regardless of gender) within the range, along with the meaning of that name (using this additional data file).

Potential Errors

Your final version should be robust enough to handle the following cases reasonably without crashing (for each case, explain how you handled the potential error):

an invalid or empty data source (i.e., a non-existent file name or URL or one that exists but contains no data)
ranges of years that do not fit completely within the years in the given source of data
names that do not match the exact case of those in the various data files
genders that are not either M or F (the only ones given in the data files)

Testing

Your final version should include tests for the following:

at least three tests for each question with comments explaining something specific each one is testing (such as special or boundary cases)
at least one test showing how each error case is handled
accessing at least three different data sources (one of which must be from this URL).

Extra credit: support one of your data sources being a ZIP archive file using this Java class (note, the JAR file itself could be local or from the web!).

Design Notes

Your code should include as little duplicated code as possible, so use parameters effectively or make general utility methods/classes that support answering multiple questions — remember it should be easier to write code to answer later questions if you are building on a well designed foundation.

Submission

Use GIT to commit and push your code to your provided data_NETID repository as your final submission. Note, basic comments and the project README (especially how to change the source of the data) must also be completed with this submission.