More Details - CompSci 101 Fall 2016, Asgn 8 Recommender

For data files see the data directory or snarf the assignment. You are not given any already-written Python code.

You are asked to write several Python modules for this assignment. Details are here and in the howto pages.

ReadFood.py

Here is the input file.

Shirley
IlForno 3 DivinityCafe 5 McDonalds -1 TheCommons 3 Tandoor 1
Xiawei
McDonalds -3 TheCommons 5 DivinityCafe 5 TheSkillet 1 PandaExpress -5
SoonLee
DivinityCafe 3 IlForno 1 TheSkillet -1 Tandoor 5 PandaExpress -3
Bruce
McDonalds 1 Tandoor 3 DivinityCafe 5 TheCommons 3 TheSkillet 1 IlForno 3 PandaExpress 3
JoJo
TheSkillet 1 McDonalds 1 Tandoor 3 PandaExpress 1
Lee
TheCommons 3 Tandoor 3 DivinityCafe 5 TheSkillet 3 IlForno 1

Shirley is the first rater. She rated five places. IlForno was rated a 3, DivinityCafe was rated a 5, McDonalds was rated a -1, TheCommons was rated a 3, and Tandoor was rated a 1. Xiawei is the next rater. She rated McDonalds a -3, TheCommons a 5, etc.

The restaurants may not be in the same order for each rater, and a rater may only rate a few restaurants. Any they do not rate you should assign a 0.

The function processdata should return a list of the unique items, that might be this list (your list may have a different ordering):

['IlForno', 'TheCommons', 'DivinityCafe', 'PandaExpress', 'TheSkillet',
'Tandoor', 'McDonalds']

You also will return a dictionary. It might look like this (may not be the same order):

dict [('JoJo', [0, 0, 0, 1, 1, 3, 1]), ('SoonLee', [1, 0, 3, -3, -1, 5,
0]), ('Lee', [1, 3, 5, 0, 3, 3, 0]), ('Bruce', [3, 3, 5, 3, 1, 3, 1]),
('Xiawei', [0, 5, 5, -5, 1, 0, -3]), ('Shirley', [3, 3, 5, 0, 0, 1, -1])]

You will have to think about how you will process the data. You will need to first know all the unique restaurants. Once you know them, you could put them in a list and then use that ordering for rating restaurants. As you process the initial data, you may want to store it so you can then process it a second time once you know how many different restaurants there are.

Then for each rater you could create an initial list of ratings as all 0's. As you process the data, you could update the appropriate rating.

For example, with this file their are 7 restaurants. You could initialize each key value pair in the dict with the value [0,0,0,0,0,0,0]. Then when you process Shirley, you would update the first slot to be 3, since Shirley rated IlForno a 3. You would update the 3rd slot to be 5 since Shirley rated DivinityCafe a 5, etc.

ReadBooks.py

For book information, you will read data from two files and combine the data.

Here are the first eight lines from the file authorsAndBooks.txt. Each book is on one line with the line number first, then the author and then the title. They are separated by "::".

1::Patricia Cornwell::Postmortem
2::Agatha Cristie::The Secret Adversary 
3::John Grisham::The Firm
4::Douglas Adams::The Hitchhiker's Guide To The Galaxy 
5::Richard Adams::Watership Down
6::Mitch Albom::The Five People You Meet in Heaven 
7::Laurie Halse Anderson::Speak
8::Maya Angelou::I Know Why the Caged Bird Sings 
...

Here are the first few lines in the file bookRatings.txt. The rater is first on one line followed by one line for each rating, with the ratings in the same order the books are in the files above. Thus Rus did not rate the first book Postmortem. She rated the second book The Secret Adversary a 3, and did not rate the next six books. Canra follows Rus and rated the book Postmortem a 1, she did not rate the next two books and then rated The Hitchhiker's Guide to the Galaxy a 5. Note that 0 means no rating.

Rus 
0 
3 
0 
0 
0 
0 
0 
0 
...
0 
Canra
1 
0 
0 
5 
3 
0 
...

The function processdata should return a list of the unique items, that might be this list (in this case it makes sense to have the same order as the file):

['Postmortem,Patricia Cornwell', 'The Secret Adversary,Agatha Cristie',
'The Firm,John Grisham', "The Hitchhiker's Guide To The Galaxy,Douglas
Adams", 'Watership Down,Richard Adams', 'The Five People You Meet in
Heaven,Mitch Albom', 'Speak,Laurie Halse Anderson', 'I Know Why the Caged
Bird Sings,Maya Angelou', 'Thirteen Reasons Why,Jay Asher', 'Foundation
Series,Isaac Asimov', 'The Sisterhood of the Travelling Pants,Ann
Brashares', 'A Great and Terrible Beauty,Libba Bray', 
...
[NOT ALL SHOWN]
...
]

And also return a dictionary of ratings, only partly shown below, just two entries are shown.

dict [('ender', [0, 0, 0, 5, 0, 0, 0, 0, 0, 5, 0, 0, 3, 0, 5, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 5, 3, 0, 5, 0, 3, 0, 0]), 
('Leah', [0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0]), 
...
]

ReadMovies.py

Here are the first seven lines of the file movieRatings.txt. Each movie is on one line and has three pieces of information with each surrounded by parenthesis: the rater, the movie title, and the rating. For example, on the first line student1367 rated the movie "Star Trek Beyond" and gave it a 3.

(student1367)(Star Trek Beyond)(3)
(student1367)(The Edge of Seventeen)(3)
(student1367)(The Revenant)(5)
(student1046)(The Good Dinosaur)(3)
(student1206)(Brooklyn)(1)
(student1103)(The Revenant)(5)
(student1046)(The Edge of Seventeen)(3)
...

The function processdata should return a list of the unique items, that might be this list (yours may be in a different order). Here are some of the movies, not all are shown.

['Knight and Day', 'The Butterfly Effect', '50 First Dates', 'Love
Actually', 'Date Night', 'Unstoppable', 'Tooth Fairy', 'Secretariat', 'A
Nightmare on Elm Street', 'Kill Bill: Vol. 2', ... ]

And also return a dictionary of ratings, partly shown below. For example, student1250 rated "The Butterfly Effect" a 3 and "50 First Dates" a 1. They did not rate "Knight and Day".

[('student1250', [0, 3, 1, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 5, 0, 5, 0,
0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -3, 0,
1, 0, -3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 5, 3,
0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, -3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0,
0, 0, 0, 0, 0, 3, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 3, 0, 0, 0,
0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 1, 0, 3, 0, 0, 1, 0, 3, 0, 0,
0, 0, 3, 0, 3, 3, 0, 0, 0, 5, 0, 0, 0]), 
('student1251', [0, 3, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 5, 3, 0, 0, 0, 0, 0, 0, 3, 3, 0,
0, 0, 0, 0, 0, 0, 5, 3, 0, 3, 3, -5, 1, 0, 0, 5, 0, 5, 0, 3, 0, 0, 0, 3, 0,
0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, -3, 0, 0, 0, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0,
0, 5, 0, 0, 0, 0, -3, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0,
0, 3, 0, 0, 0, 0, 0, 0, 5, 5, 0, 0, 0, 0, 0, 3, 0, 0, 0, 1, 0, 0, 3, 0, 0,
0]), 
('student1252', [0, 5, -3, 3, 3, 1, 0, 0, 1, 3, 3, 0, 3, 0, 5, 3, 1,
1, 0, 0, 1, 0, 0, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 3, 0, 0, 0, 5, 0, 1,
1, 0, 1, 5, 3, 5, 3, 0, 0, 0, 0, 0, 3, 0, 1, 0, 3, 0, 0, 5, 0, 0, 0, 0, 3,
5, 1, 3, 0, 0, 3, 3, 5, 3, 3, 5, 0, 0, 0, 0, 0, 0, 0, 1, 3, 0, 0, 1, 5, 5,
5, 0, 5, 0, 0, 0, 0, 3, 0, 3, 0, 3, 5, 3, 0, 1, 3, 0, 1, 0, 5, 0, 3, -3, 0,
1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 3, -3, 0, 1, 3, 0, 0, 3, 3,
0, 0, 0, 0, 3, 0, 3, 3, 5, 5, 3, 3, 3, 1, 0]), 
('student1253', [0, 0, 3, 0,
0, 0, 0, 0, 0, 0, 1, 0, 5, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0,
0, 0, 0, 0, 5, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, -3, 0, 0, 0, 0, 0, 0, 0, 5, -3, 3, 5, 0, 0, 3, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 0, 3, 5, 0, 0, 0, 3, 0, 0, 0, 0, 0,
0, 0, 0, 0, 3, 0, 0, 0, 5, 0, 0, 0, 1, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0,
3, 3, 0, 0]), 
('student1254', [0, 0, 3, 1, 3, 5, 3, 0, 0, 0, 0, 0, 3, 0, 0,
3, 0, 0, 0, 0, 0, 3, 5, 1, 3, 0, 1, 3, 0, 0, 5, 0, 0, 0, 0, 0, 0, 3, 0, 0,
0, 0, 1, 3, 0, 0, 1, 0, 0, 3, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 3, 5, 0, 0, 0,
0, 0, 1, 0, 3, 0, 0, 0, 3, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 5, 1, 0, 3, 1,
0, 3, 0, 3, 0, 5, 0, 0, 0, 0, 0, 3, 0, 0, 0, 3, 0, 5, 3, 0, 3, 1, 0, 0, 0,
5, 0, 0, 0, 3, 5, 0, 3, 0, 0, 3, 0, 0, 0, 0, 0, 3, 3, 0, 1, 0, 0, 1, 0, 0,
3, 0, 0, 0, 3, 3, 3, 0, 0, 3, 0, 0, 0, 5, 5, 0, -3]), 
('student1255', [3,
3, 0, 0, 3, 0, 0, 0, 0, 0, 5, 0, 3, 0, 0, 5, 0, 0, 0, 3, 0, -3, 0, 3, 5, 3,
0, 3, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 3, 0, 0, 0, 0,
0, 0, 0, 3, 0, 5, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 3, 0, 0, 3, 5, 5, 0,
0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 5, 3,
0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 3, 0, 3, 1, 0, 0, 3, 1, 5, 0, 1, 0, 0, -3,
0, 3, 0, 0, 0, 0, 3, 0, 0, 1, 3, 0, 0, 3, 0, 5, 0, 0, 0, 0, 3, 0, 0, 5, 5,
0, 0, 3, 3, 3, 0]), 
...
]

Reading Data

Each data-reading module you write has a function processdata that returns two values: a list and dictionary.
First a toy example about food is shown below. Feel free to use parts of this code in your module. This module shows how to process a file that contains data for one rater, the rater's name is the first line of the file. So a sample data file for this format might look like the following. This is a contrived example and the data is in a different format, also real examples have multiple users. In this example all users would need to have food in the same order so that the ratings in rdict would be in the same order for multiple users.

Charlie
vanilla milkshake:5
burrito:4
butterflied leg of lamb:-3
eggplant parmesan:1

Code to process this file: def processdata(filename): ratings = [-5,-3,-1,0,1,3,5] itemlist = [] ratingsdict = {} f = open(food_file) ratingsdict[name] = [] name = f.readline() # get name, now ratings for line in f: line = line.strip().split(":") item = line[0] rating = int(line[1]); itemlist.append(item) ratingsdict[name].append(rating) f.close() return itemlist, ratingsdict

The example code above shows a hypothetical module for rating food for a single rater named Charlie. The function has a parameter that's the name of a file storing foods and ratings as shown above.

In a real example the data ratings would be stored in one or more of the files to be read.
This example shows how to return two values from a function, essentially returning a tuple.

NOTE: The foodRatings.txt file is in a slightly different format than this. Look at the file to figure out how to process it.

RecommenderFood

This is the main program for restaurants. You will use code from Recommender and ReadFood. To accomplish this you will need to import the functions you want to use.

from Recommender import averages
from Recommender import similarities
from Recommender import recommended
from ReadFood import processData

Then you will call processfood to process the datafile. Here is an outline of what you might do. (NOT ALL CODE SHOWN)

 
    foodfile = "foodRatings.txt"
    fooditems, fooddict = processData(foodfile)
     ...
    resultavg = averages(fooditems,fooddict)
     ...
    person1 = "Shirley"
    resultsim = similarities(person1, fooddict)

    ...
    resultrec = recommended(resultsim, fooditems, fooddict,3)
    ...
    person1 = "Xiawei"
    resultsim = similarities(person1, fooddict)

    ...
    resultrec = recommended(resultsim, fooditems, fooddict,3)

The output would then be:

Floats should be shown for averages. Don't worry about the number of digits displayed. Only two shown below but you can show more.


Restaurants and their average ratings
-------------------------------------
('DivinityCafe', 4.6)
('TheCommons', 3.5)
('Tandoor', 3.0)
('IlForno', 2.0)
('TheSkillet', 1.0)
('McDonalds', -0.5)
('PandaExpress', -1.0)


Ratings similar to Shirley
------------------------
['Bruce', 45] 
['Xiawei', 43]
['Lee', 40]
['SoonLee', 23]
['JoJo', 2]]

Recommendations for Shirley with 3 most similar raters
----------------------------------------------------
('DivinityCafe', 213.33333333333334)
('TheCommons', 156.66666666666666)
('Tandoor', 127.5)

Ratings similar to Xiawei
-------------------------
['Lee', 43]
['Shirley', 43]
['SoonLee', 29]
['Bruce', 23]
['JoJo', -7]


Recommendations for Xiawei with 3 most similar raters
-----------------------------------------------------
('DivinityCafe', 172.33333333333334)
('TheCommons', 129.0)
('Tandoor', 105.66666666666667)

Recommendations below not part of the output, but just to show what the top three recommendations are if you use the top 2 or 4 recommenders, which in the latter case is almost all the recommenders.

Recommendations for Shirley with 2 most similar raters
--------------------------------------------------------
('DivinityCafe', 220.0)
('TheCommons', 175.0)
('IlForno', 135.0)

Recommendations for Shirley with 4 most similar raters
--------------------------------------------------------
('DivinityCafe', 177.25)
('TheCommons', 156.66666666666666)
('Tandoor', 123.33333333333333)

Recommendations for Xiawei with 2 most similar raters
--------------------------------------------------------
('DivinityCafe', 215.0)
('TheCommons', 129.0)
('TheSkillet', 129.0)

Recommendations for Xiawei with 4 most similar raters
--------------------------------------------------------
('DivinityCafe', 158.0)
('TheCommons', 109.0)
('Tandoor', 96.5)

RecommenderBooks

This is similar to RecommenderFoods. You need to print out a different number of items. See the requirements.

RecommenderMovies