Compsci 101, Fall 2017, Fast Processing of Airport Data

Due: Thursday, Nov 2, by 11:59pm

10 points

See the HOWTO page for more details on which dictionaries to setup to solve some of these problems. It includes suggested steps if you don't know how to get started with this assignment. It also has a smaller datafile you can use for testing your program.

Answering Questions about Airports

There are many real data sets on the web such as the CORGIS Dataset Project from Virginia Tech. I grabbed a dataset that has information about flight delays at major airports in the U.S. since 2003 on a monthly basis.

We've processed the data and put it into an easier format for you to process. You will read the data from our course website. Each line represents one month at an airport and we have included the following information about each line: airport code (3 letters), month name, year, number of cancelled flights, number of carriers, number of flights delayed, number of flights total, number of flights on time, and the airport name. You'll write your program to answer questions about these airports.


Requirements

To get full credit you must
  1. Put all your code in a Python module named AirportInfo.py.

  2. Run your program on the following large online data file. You should read it in from the URL:

    http://www.cs.duke.edu/courses/fall17/compsci101/data/airportDataOct2017.txt

    Each line of data represents one month at a specific airport. The fields are separated by "$".

    For example, here are a few lines from the datafile:

    ATL$June$2003$216$11$5843$30060$23974$Atlanta, GA: Hartsfield-Jackson Atlanta International
    BOS$June$2003$138$14$1623$9639$7875$Boston, MA: Logan International
    BWI$June$2003$29$11$1245$8287$6998$Baltimore, MD: Baltimore/Washington International Thurgood Marshall
    CLT$June$2003$73$11$1562$8670$7021$Charlotte, NC: Charlotte Douglas International
    DCA$June$2003$74$13$1100$6513$5321$Washington, DC: Ronald Reagan Washington National
    DEN$June$2003$34$13$1611$11691$10024$Denver, CO: Denver International
    

    The first line above represents the Atlanta airport (code ATL) in June 2003 that had 216 cancelled flights, 11 carriers, 5843 delayed flights, 30060 total flights, 23974 ontime flights and has the name: Atlanta, GA: Hartsfield-Jackson Atlanta International.

  3. Write a program that uses dictionaries to organize the data so you can answer the following three questions about this monthly airport data. If there is a tie for any of these, then just print any of the ones that tie.

    1. Question 1: Determine which airport has had the most months with 100 or more cancelled flights. Print the airport code and how many months they had 100 or more cancelled flights.

    2. Question 2: For each airport determine which month is the busiest, where busiest is defined as the highest average number of flights.

    3. Question 3: List the airports that have 80 percent or higher of their flights as ontime. For each such airport, list the airport code and the percentage of flights on time on one line. These airports should be listed in alphabetical order by airport code.

  4. You must create at least three dictionaries. Each dictionary should help organize data to help answer one of the questions above.

  5. You should have at least four useful functions. For example, you could have a function that builds a dictionary. You could have a function that reads in the file from the url and puts it in a different format.

  6. Be sure to clearly identify your output. Don't just print a number but state what the number represents.

  7. You must have a comment for each function you write describing what that function does, in addition to a comment at the top of the file with your name.


Submission