Compsci 101, Fall 2016, Fast Processing of Earthquake Data

Due: Thursday, Nov 10, by 11:59pm + 121 minutes

10 points

See the HOWTO page for more details on which dictionaries to setup to solve some of these problems. It includes suggested steps if you don't know how to get started with this assignment. It also has a smaller datafile you can use for testing your program.

Earthquakes again, processing them faster

We will again provide real earthquake data on the earthquakes around the world that happened for 30 days around October 2016 (this is NEW data). We obtained this data from the U.S. Geological Survey (USGS) EarthQuakes Hazards Program. We've processed the data and put it into an easier format for you to process. You will read the data from our course website. We have included the magnitude and place for each earthquake that happened in this time period. You'll write your program to answer questions about these earthquakes.


Requirements

To get full credit you must
  1. Put all your code in a Python module named EarthquakesPart2.py.

  2. Use the following new data file (this data is newer than the previous data file we used with the other earthquake assignment). This data file is online and large. You should read it in from the URL:

    http://www.cs.duke.edu/courses/fall16/compsci101/data/earthquakeDataNov1-2016past30days.txt

    Each line of data represents the registering of an earthquake in the format "$magnitude - descriptionOfLocation". For example, here are a few lines from the datafile:

    $1.8 - 5km WSW of Volcano, Hawaii
    $4.9 - 62km E of Namie, Japan
    $5.2 - 84km SE of Haebaru, Japan
    $1.1 - 1km SE of The Geysers, California
    $1.0 - 1km ESE of Mammoth Lakes, California
    $1.6 - Explosion - 2km SW of Princeton, Canada
    $4.9 - North Atlantic Ocean
    $2.2 - 78km NNE of Road Town, British Virgin Islands
    $-0.1 - 6km SSE of Beatty, Nevada
    $0.9 - 10km NE of Indio, CA
    

    The first line in the data file represents an earthquake with 1.8 magnitude that occurred 5km WSW of Volcano, Hawaii.

  3. Write a program that uses dictionaries to organize the data so you can answer the following three questions about this earthquake data. If there is a tie for any of these, then just print any of the ones that tie.

    1. We will call the last part of the location description that comes after the last comma, if there is a comma, the "base location". The base location is usually a state (such as Hawaii and California in example lines 1 and 4 above), a state abreviation (such as CA in line 10 above), a country (such as Japan in line 2 above) or some other notation. If there is no comma then use the complete location description (such as "North Atlantic Ocean" in line 7 above).

      Calculate which base location occurs the most often in the file, and print how many times it occurs.

    2. Calculate the highest average magnitude for location (not base location, use the whole location name for this) for those locations that appear five or more times in the file. Print this average and also the location.
    3. Print the Magnitude that occurs the most often and state how many times it occurs and in how many unique locations.

  4. You must create three dictionaries. Each dictionary should help organize data to help answer one of the questions above.

  5. Be sure to clearly identify your output. Don't just print a number for the highest magnitude but state what the number represents.

  6. You must have a comment for each function you write describing what that function does, in addition to a comment at the top of the file with your name.


Submission