There is code to snarf for lab 7, that code is also here.
To get credit for this lab, you will need to do the following by Sunday night.
For this problem, you are given a data-file of the top 1000 rock and roll songs, e.g., from this source among others: http://www.rocknrollamerica.net/Top1000.html -- you can see the same data here as a google spreadsheet
The data file you're given is a file in CSV format (comma separated values). Each row of the CSV-table includes the rank, the song, and the artist. The first seven lines are reproduced below. As with many CSV files, there's a header as the first row of the file and also a header in the spreadsheet the file models.
Rank | Song | Artist |
---|---|---|
1 | Stairway to Heaven | Led Zeppelin |
2 | Hey Jude | Beatles |
3 | All Along the Watchtower | Hendrix, Jimi |
4 | Satisfaction | Rolling Stones |
5 | Like A Rolling Stone | Dylan, Bob |
6 | Another Brick In The Wall | Pink Floyd |
You'll answer a few questions about this data by modifying the Python code in module SongReader.py.
If you've had experience with spreadsheets, it's possible you could answer some of these questions using functions in the spreadsheet, but for many problems using Python will let you solve problems more easily.
To answer these questions you'll modify the module SongReader.py and use dictionaries. Additional problems with a different data set follow the questions below. These questions are meant to guide you toward a solution to the three questions above.
Documentation for the Python csv library is here: https://docs.python.org/2/library/csv.html, though you likely will not need to read this to complete the lab.
In the program, the artist/group in each row read is row[2], the name of the song is row[1]. You can see this in the program. To find the top artists, use the dictionary in the variable datasg inside the loop. Use the artist as the key mapped to the associated value, which is a list of that artist's songs. Add each song read to the list of songs for the artist. Here's the code you'll need (if variables artist and song have been set appropriately). This is typical code that initializes a dictionary value associated with a key the first time a key is seen, and modifies the value after the first time.
if artist not in datasg: datasg[artist] = [song] else: datasg[artist].append(song)
After maintaining the dictionary, you should be able to print the dictionary keys and values after the loop to eyeball top artists, try this code to see that yours works
for artist in datasg: print artist, datasg[artist]
Add the following code/list comprehensions shown below to replace the statements you just entered above: this code sorts and prints sorted data rather than all the data
info = datasg.items() tosort = [(len(t[1]),t[0]) for t in info] info = sorted(tosort) print info[-30:]Then explain the answers to the following questions in the online form for lab.
You're also given a file 9600movies.csv that has information on 9,600 movies including director, year, title, genre, country, length, and whether the movie is black-and-white or color. You can also see the data here in this online spreadsheet
You should create a module MovieReader.py and write code to answer the following questions. You can use the code from SongReader.py as a model. In some problems you may need to use a dictionary, but not in all problems. Answer the following questions on the online form.
You'll answer several questions and pose your own question. The answers and code you use to answer them should be added to the form you complete for lab.
After submitting the lab form, then submit the code you wrote for both parts of the lab with ambient/websubmit. Use lab09 as the submission folder.