Recommender
Collaborative filtering and content-based filtering are two kinds of recommender systems that provide users with information to help them find and choose anything from books, to movies, to restaurants, to courses based on their own preferences compared to the preferences of others.
In 2009 Netflix awarded one million dollars to a group that had developed a better-recommender system than Netflix's own system. This NY Times Magazine article describes the competition, the winning teams, and how the movie Napoleon Dynamite caused problems for the algorithms and ranking/rating systems developed by contest participants.
In this assignment, adapted from a Nifty Assignment developed by Michelle Craig, you will develop a program to recommend items based on the responses made by others. You will be reading data from files, using Python dictionaries and lists, and sorting data to find good matches.
The assignment comes in two conceptual parts:
- Reading data stored in files and converting it into a common data structure
- Using the data structure to make recommendations for a particular rater
You will need to write a separate Python module to deal with each data file then use the data structures these modules return to develop recommendations. Although the file formats are different, the ratings in each have the same meaning:
Rating Meaning 5 Really liked it! 3 Liked it! 1 Okay — neither hot nor cold about it 0 Have not read it -1 Not bad — but nothing really to say about it -3 Didn't like it -5 Hated it!
To get started, download this code using Ambient's snarf tool.
Basic Specifications
Here is a high-level overview of the two Python modules you will complete, with links that take you to more information for each module in the HOWTO document.
Module BookReader.py
:
getData(bookfilename)
Write a function that, given the name of a file of data about book ratings, returns two sequences:- a list of strings, the book's titles in the order given in the file, and
- a dictionary of
string
s as the key and a list ofint
s as the values, the raters and their ratings of the books
Module Recommender.py
:
recommend(name, items, ratings, count)
Write a function that returns a list of tuples, (string
,float
), the name of recommended items and the average rating for that item. The list returned should be sorted from the highest recommendation score (see the HOWTO for more details) to the least and only include items with a positive score that have not been rated byname
. The parametername
is the name of one of the raters, the parametersitems
, a list, andratings
, a dictionary, are the sequences returned from any of thegetData
functions, andcount
is a number indicating how many ratings should be considered in building the recommendation list.
These functions should work for any kind of thing we might want recommendations about, i.e. they should be able to be called with list and dictionary data structures returned by your Reader modules.
In addition to the standard information included in your README file, include an analysis of your project:
- Describe how you determined that the functions you
wrote in
Recommender.py
work correctly, i.e., what did you do to verify the results, how did you debug the code, and so on. - Describe how different values of
count
affect what items are returned as recommended when calling the functionrecommended
.
Bonus Specifications
Module MovieReader.py
:
getData(ratingsfilename)
Write a function that, given the name of a file of data about movie ratings, returns two sequences:- a list of strings, the movie's titles in an order determined by you, and
- a dictionary of strings as the key and a list of ints as the values, the raters and their ratings of the books
Submission
Submit your entire PyDev project and a plain text README file electronically from within Eclipse or on the web to the assignment name assign04_recommender.
Please double check that you submitted the correct files. Within Eclipse, this can be done using the Ambient menu item Submit History...
and, on the web, the files submitted are printed at the bottom of the page after a successful submission.