Compsci 06/101, Fall 2011, Baby Names

Inspired by The Baby Name Wizard's NameVoyager visualization of the data from this government website, we have downloaded the last 30 years, 1981-2010, of the top 500 most popular baby names in the United States and made the data available here and also in the file you will snarf. You will complete five programs that uses these yearly rankings to print out information that is similar to what can be seen using the Wizard.

Complete the following five modules:

  1. BarChartPlot: Pick a name and gender and find all its ranks over the last 30 years. If the name was not one of the top 500 for a given year, use 501 for that year (one more than the maximum rank). As your program's output, plot its popularity as a bar chart, using the function FileUtilitiesProvided.createBarChart.
  2. AverageRank: Pick a name and compute its average rank over the last 30 years. If the name was not one of the top 500 for a given year, use 0 for that year; these years should not count as part of your name's average rank, only those year's in which it was one of the top 500. As your program's output, print out your name and its average rank on one line together then, for each of the last thirty years, print the year and its rank that year together on a line.
  3. PopularGirlNames: Find the letter with which most girl names start over the last 30 years. As your program's output, print all the names that start with this letter in alphabetical order each on a line by themselves.
  4. PopularBoyNames: Find the boy name from which the most other boy names are derived over the last 30 years. For example, the names Frankie and Franklin both start with the name Frank thus, for our purposes, they are derived from it. Note, this definition will also count some alternate spellings as derivations, such as Glen and Glenn. In the case of a tie, return just the alphabetically first name. As your program's output, print the prefix name and all derived names in alphabetical order each on a line by themselves.
  5. PopularName: Each year two names are ranked #1, one for each sex, find the name, of any sex, that has most often been the ranked #1 over the last 30 years. As your program's output, print out the name and how often it has ranked #1 on a single line.

Note, only the name's rank matters, not the number of babies with that name, i.e., the first column, not either of the other numeric columns.

We have included the Python module FileUtilitiesProvided which includes several functions you can use, so you should not modify or add code to this file. Although these functions are general enough to be used in multiple assignments, you should assume the file names for the yearly data are named in the following way in the data folder: "../data/fileXXXX.txt" where XXXX is the year for those rankings. You can snarf the starting files for this assignment or view them here.

A-credit/challenge: Again try to develop as many general functions for working with the baby name data files that can reused to simplify writing these programs. These general functions should be written in a separate module, UtilityFunctions, that is imported into each of the five programs you write. In other words, try to reduce the amount of new code you need to write to solve each problem as much as possible.

Submit and Grading

Submit your source code: the five programs mentioned above and UtilityFunctions; as well as a README file and an ANALYSIS file described below. Use the submit name assign5-names.

In your ANALYSIS file, discuss the steps you took to generalize functions from yourUtilityFunctions module so they could be used by other modules (i.e., how they are different than if you had just written them for one or the other specifically). Additionally, document any bugs or problems in your program that you were not able to resolve (i.e., there may be certain kinds of input that you know are not handled properly). If you document bugs that you cannot fix, and how you tried to fix them, they will affect your grade far less than bugs we discover in running your program.

Your grade will be based on how well your programs function and whether you have included appropriate README and ANALYSIS files.