CompSci 6 Spring 2010: Assignment 4

Due: Tuesday, Feb 16 - 11:59pm

15 points

Setup

Snarf the code from the course assignments folder.

APTs

Complete the following APT and test using the APT page. Create the class for the APT in the same project (assign4_cps006_spring10) as the rest of the code for this assignment.
  1. Birthday

Natural Prestidigitation

This nifty assignment was developed by Steve Wolfman, Pratt 1997, ECE/CS Dual Major. We've made a few minor edits to give it a bit of a local flavor.
The natural world is full of hidden and beautiful mathematics. The whorls of a conch shell hide the Fibonacci sequence and its Golden Ratio, plants grow in fractal patterns, and comets trace hyperbolic patterns through the solar system. All those beautiful patterns hide in the grungy data of human observation.

So, what are the populations of every town in North Carolina and the number of posts by various authors to a Duke sports bulletin board hiding from you?

Assignment Overview

In this assignment, you will write a program that determines the distribution of initial digits in a set of data (see the data directory for examples). In the end, we want a program that reads in a number n and a list of numbers nums and outputs a list of 10 values: the frequency with which each digit 0–9 appears as the nth digit of one of the input numbers.

We provide you with a suggested decomposition of the problem into functions that you can implement. However, you can ignore the framework below and create whatever methods you like as long as you produce the same output in the end.

Note: throughout this problem, you may assume that the numbers processed are non-negative or you can use the absolute value function Math.abs to help you handle negative numbers in a reasonable way.

Assignment Details

  1. Write a method nthDigit. nthDigit(n,num) finds the nth highest order digit of num, i.e., the nth digit from the left. We take the leftmost digit to be the 0th. nthDigit should evaluate to 0 for digits beyond the "end" of the number. For example:

    When computing

  2. Write a method nthDigitTally, using nthDigit. nthDigitTally(n, nums) returns a tally of frequencies of 0–9 as the nth digits of all the numbers in nums.

    Here's a sample test case. These are enrollments in Research Triangle Park colleges and universities in Fall 2000 (thanks to the "Research Triangle Regional Partnership" website: http://www.researchtriangle.org/data/enrollment.html).

    InstitutionEnrollment
    Duke University 12176
    North Carolina Central University 5476
    Louisburg College (Junior College) 543
    Campbell University 3490
    University of North Carolina at Chapel Hill 24892
    North Carolina State University 28619
    Meredith College 2595
    Peace College 603
    Shaw University 2527
    St. Augustine's College 1465
    Southeastern Baptist Theological Seminary 1858
    Assume the variable enrollments contains the enrollment numbers from that table. Then:

    nthDigitTally(0, enrollments)[0,3,4,1,0,2,1,0,0,0]

  3. Write a method readNumbers that reads whitespace-separated integers from a Scanner and returns a list of the numbers suitable as input to nthDigitTally. Here's the university enrollment data from above:
    12176
    5476
    543
    3490
    24892
    28619
    2595
    603
    2527
    1465
    1858
    
    From this, readNumbers should produce the list [12176, 5476, 543, 3490, 24892, 28619, 2595, 603, 2527, 1465, 1858].

  4. Finally, compose your main method to prompt the user for the number n and to choose a file for the data set. The program should tally the nth digits of the numbers in the data set and print out a table of the results. For example, given that n=0 and the following file: 12176 5476 543 3490 24892 28619 2595 603 2527 1465 1858 Your program should print: 0s: 0 (0%) 1s: 3 (27%) 2s: 4 (36%) 3s: 1 (9%) 4s: 0 (0%) 5s: 2 (18%) 6s: 1 (9%) 7s: 0 (0%) 8s: 0 (0%) 9s: 0 (0%)

  5. In your README, you should describe the distribution of digits for each of the data files and how they conform or differ from what you would expect from a random list of numbers.

Extra Credit

  1. To be human-readable, the data files should also allow labels for the data. We'll accomplish this by allowing commenting in the input file. Change readNumbers to ignore anything between (* and *). (You may assume that the (* and *) symbols will be surround by whitespace and that nested comments — comments inside other comments — are not allowed.) Now, the unversity data set can look like:
    (* Duke University *)                             12176
    (* North Carolina Central University *)           5476
    (* Louisburg College (Junior College) *)          543
    (* Campbell University *)                         3490
    (* University of North Carolina at Chapel Hill *) 24892
    (* North Carolina State University *)             28619
    (* Meredith College *)                            2595
    (* Peace College *)                               603
    (* Shaw University *)                             2527
    (* St. Augustine's College *)                     1465
    (* Southeastern Baptist Theological Seminary *)   1858
    

  2. If you want to find the patterns hidden in the numbers around you, try the following three-part bonus problem:
    1. Find a data source on the web that no one else has used (see next part) and transform it into a format suitable for input to readNumbers. The data must all be separate measurements of a single type of phenomenon. For example: measurements of university/college enrollments across different institutions (like above) or at the same institution across different years; measurements of the flow rates of all the major rivers in British Columbia; measurements of the height of 10000 randomly chosen Vancouver residents; measurements of the number of hits per day on the UBC computer science website over three years; measurements of the length in characters of each article in the Wikipedia; measurements of the population of the 1000 largest cities and townships in Canada; etc. Furthermore, there must be at least 250 measurements in the list (but more would be better!).

    2. Post all of the following items to the course discussion forum with a title describing your data: the URL for your data source, a description of the data source, one attachment with bare data suitable for readNumbers, and one attachment with labelled data (using the (* *) style above).

    3. Submit with your assignment the URL of your data, a description of the data source, and digit tallies for digit 1 and digit 2 of your data (using nthDigitTally). Are there any oddities in the tallies? What about in other students' data?

Hints:

Submitting

Submit Birthday.java, Prestidigitation.java, README.txt, and any extra credit files under assignment name assign4. Provide comments for the methods you write and include any tests that you wrote for the individual methods. Last updated Sun Jan 31 14:00:35 EST 2010