Recommender
Collaborative filtering and content-based filtering are two kinds of recommender systems that provide users with information to help them find and choose anything from books, to movies, to restaurants, to courses based on their own preferences compared to the preferences of others.
In 2009 Netflix awarded one million dollars to a group that had developed a better-recommender system than Netflix's own system. This NY Times Magazine article describes the competition, the winning teams, and how the movie Napoleon Dynamite caused problems for the algorithms and ranking/rating systems developed by contest participants.
In this assignment, adapted from a Nifty Assignment developed by Michelle Craig, you will develop a program to test three different algorithms for recommending items based on the responses made by others. You will be practicing reading data from files, using Python dictionaries and lists, and sorting data to find good matches.
The assignment comes in two conceptual parts:
- Reading data stored in files and converting the data into a common format
- Using the data (stored in the common format) to make recommendations for either every students/rater or for a particular student as described below.
Sometimes ratings are stored in a single file, sometimes in more than one file. You will need to write a separate Python module to deal with each data source, then use what these modules return to develop recommendations. Although the file formats are different, the ratings in each have the same meaning:
Rating Meaning 5 Really liked it! 3 Liked it! 1 Okay — neither hot nor cold about it 0 Have not read it -1 Not bad — but nothing really to day about it -3 Didn't like it -5 Hated it!
Base Specifications
This assignment asks you to write several Python modules. Here is a high-level overview of those modules, with links that take you to more information for each module in the HOWTO document. To get started, download this code using Ambient's snarf tool.
Module BookReader.py:
getData(bookfilename, ratingsfilename)
Write a function that, given the names of two files of data about book ratings, returns two sequences:- a list of strings, the book's titles in the order read from the file, and
- a dictionary of strings as the key and a list of ints as the values, the raters
and their ratings of the books
Module Recommender.py:
average(items, ratings)Write a function that returns a list of tuples consisting of the name of item rated (
string) and its average rating (float). The list returned should be in sorted order from the highest average rating to lowest. The two parameters given are the listitemsand the dictionaryratings, both returned from any of yourgetDatafunctions.
similarities(name, ratings)
Write a functionthat returns a list of tuples consisting of the name of the rater (string) and a similarity-index (int). The list returned should be sorted from the highest similarity index to the lowest. The parameters given are a string that is the name of a rater and the dictionary returned from any of yourgetDatafunctions.recommended(simList, items, ratings, count)
Write a function that returns a list of tuples, the names of recommended items (string) and the recommendation score for that item (int). The list returned should be sorted from the highest recommendation score to the least. The parametersimListis the list returned by the functionsimilarities, the parameter that is the listitemsand the dictionaryratingsare the sequences returned from any of yourgetDatafunctions, andcountis a number indicating how many ratings fromsimListshould be used.
These functions should work for any kind of thing we might want recommendations about, i.e. they should be able to be called with list and dictionary data structures returned by your Reader modules.
In addition to the standard information included in your README file, include an analysis of your project:
- Describe how you determined that the functions you
wrote in
Recommender.pywork correctly.
What did you do to verify the results, how did you debug the code, and so on. - Describe how different values of
countaffect what items are returned as recommended when calling the functionrecommended.
Bonus Specifications
Module MovieReader.py:
getData(ratingsfilename)
Write a function that, given the names of a file of data about movie ratings, returns two sequences:- a list of strings, the movie's titles in the order read from the file, and
- a dictionary of strings as the key and a list of ints as the values, the raters and their ratings of the books
Submission
Submit your entire PyDev project electronically from within Eclipse or on the web to the assignment name assign05_recommender.