Howto for CompSci 101 Fall 2017, Asgn 8 Recommender

Get the snarf file that has the data files you need for this assignment or get them here.

Suggested order to work on

Below is a list of the files and essential methods you need to create for this assignment. Feel free to also create additional helper methods.

IN ALL OF THESE, PRINT A LOT OF OUTPUT WHILE DEBUGGING. Strongly suggested after you create a new structure, print it out to make sure it was built correctly. If it is large, just print a small part of it, such as the first 20 items in a list, or print just ten items from the dictionary.

Here is the suggested 12-step process to work on this project.

  1. Write ProcessAllFood.py to process the food data. This is a very small file so you will be able to print a complete structure out (list, dictionary) after you build it to see if it looks correct.

  2. Write RecommenderForAll.py, just the averages function.

  3. Write RecommenderFood.py, just the part to test the averages function.

  4. Add to RecommenderForAll.py, just the similarities function.

  5. Add to RecommenderFood.py, just the part to test the similarities function.

  6. Add to RecommenderForAll.py, just the recommender function.

  7. Add to RecommenderFood.py, just the recommender to test the similarities function.

    At this point you have completed ProcessAllFood.py, RecommenderForAll.py and RecommenderFood.py. You've completed a large part of this assignment!

  8. Write ProcessAllBooks.py. The data will be in a different format than the food was, but the output should be the same: a list and a dictionary.

  9. Write RecommenderBooks.py - this is similar to RecommenderFood.py, suggest you copy it and modify it.

    You are almost done!

  10. Write ProcessAllMovies.py. The data again is in a different format than the food and books were, but the output should be the same: a list and a dictionary.

  11. Write RecommenderMovies.py - this is similar to RecommenderBooks.py, suggest you copy it and modify it.

  12. Now you are done! Submit your program and feel good about all you have learned in CompSci 101 this semester!

Below are a list of all the files and methods you need to write. Each of these should have a main section that you use to test these functions out. More detail for each are on the details page.


ProcessAllFood.py

Function: processdata( filename) : The function must read data about food and raters and return the information in a list and dictionary. (see details)

The filename argument must be a txt file of food and ratings like this sample small one named AllFoodRatings.txt(also in the snarf ) and in this format.

rater
(restaurant)(rating)
(restaurant)(rating)
(restaurant)(rating)
(restaurant)(rating)
(restaurant)(rating)
rater
(restaurant)(rating)
(restaurant)(rating)
(restaurant)(rating)
rater
(restaurant)(rating)
(restaurant)(rating)
....


Each line represents either a rater or represents a restaurant and its rating. A rater is on a line by itself and it is followed by one or more lines of restaurants and the rater's rating of that restaurant. The name of a rater will be one or more words. Each line that represents a restaurant rating will be in the format: a left parenthesis, the name of a restaurant, a right parenthesis, a left parenthesis, the rating, and a right parenthesis. Each restaurant name is one word and each rating is one number.

NOTE: A rater may not rate all of the restaurants. If they do not you should assign a rating of 0 to that restaurant, which means "not rated".

Return values

The call processdata(filename) must return two things:

itemlist a unique list of items (restaurants) being rated: [ "restaurant", "restaurant", ... ]

ratingsdict a dictionary of raters and their ratings of the restaurants:

{"ratername" : [integer ratings for each restaurant],
 "ratername" : [integer ratings for each restaurant], ...
}

The integer ratings will be in the same order as the list of restaurants. For example, the first rating in the integer list of ratings for each rater will be for the first restaurant in itemlist .


ProcessAllBooks.py

Function: processdata( booktitles, bookratings ) : The function must read data about books and raters and return the information in a list and dictionary. (see details)

The booktitles argument must be a txt file of books like AllBooksAuthors.txt (supplied with the snarf or here) with each book represented by two consecutive lines with the book title on one line followed by the author on the next line. The format for the book title line is a number, followed by a period followed by the book title. The format of the author line is the same number, followed by a period, followed by the authors name. The book titles and author names are one or more words.

1.bookTitle
1.author
2.bookTitle
2.author
3.bookTitle
3.author
....


The bookratings argument must be a txt file of raters like AllBooksRatings.txt (supplied with the snarf or here) with a rater and their ratings spread over one or more lines. The format for one rater is the word RATER, followed by a colon, followed by one or more ratings on that line and possibly several other lines. Ratings are separated by a colon, but there is no colon at the end of any line. The word RATER is always the first word on any line. There will be a rating for every book, with a 0 if the book has not been rated by this rater.

RATER:ratername:rating:rating:rating:rating
rating:rating:rating:rating:rating:rating:rating
...
rating:rating:rating:rating
RATER:ratername:rating:rating:rating:rating:rating:rating:rating
rating:rating:rating:rating
...
rating:rating:rating:rating:rating:rating
RATER:ratername:rating:rating
rating:rating:rating:rating:rating:rating:rating:rating
...
rating:rating
....

Return values

The call processdata( booktitles, bookratings ) must return two things:

itemlist a unique list of books being rated (combine each author and title into a string "title,author"
[ "title,author", "title,author", ... ]

raterdict a dictionary of raters and their ratings of the books:

{"ratername" : [integer ratings for each book],
 "ratername" : [integer ratings for each book], ...
}


The integer ratings will be in the same order as the list of books. For example, the first rating in the integer list of ratings for each rater will be for the first book in itemlist .


ProcessAllMovies.py

Function: processdata( movieratings )
(see details)

The movieratings argument must be a txt file of movies like AllMoviesRatings.txt (supplied with the snarf or here) in the format of one movie rating per three lines with three pieces of information: ratername on the first line, movietitle on the second line and rating on the third line. Note that movietitles are one or more words.

ratername
title
rating
ratername
title
rating
ratername
title
rating
...

Return values

The processdata( movieratings ) must return two things:

itemlist a unique list of movies being rated:  [ "title ", "title ", "title ", ... ]
(see details and examples here)

ratingsdict a dictionary of raters and their ratings of the movies:

{"ratername" : [integer ratings for each movie],
 "ratername" : [integer ratings for each movie], ...
}


The integer ratings will be in the same order as the list of movies. For example, the first rating in the integer list of ratings for each rater will be for the first movie in itemlist .

Note that all three functions processdata for food, books and movies, return the same types: a list of strings (each item being rated) and a dictionary in which keys are raters and values are lists of integer ratings by each rater.


RecommenderForAll.py

In the module RecommenderForAll.py you must write the three functions shown in the details pages: averages, similarities, and recommended. Each of these will be used for comparing ratings of restaurants, movies and books in other programs, that is, all three of these will be imported into other Python modules.

Here is briefly what each of these do, with more detail for each on the details page.


RecommenderFood.py

This is the main program for restaurant recommendations. It should read in the file of restaurant recommendations and then call processdata in ProcessAllFood.py to get the list of restaurants and the dictionary of raters and their ratings.

Then it will call the functions in RecommenderForAll to find out what the average rating is for each restaurant, find top raters who are similar to another rater, and give recommendations to a rater.

In particular, it should produce the following output. For all three make sure your output is easily identifiable and understandable.

(See details)



RecommenderBooks.py

This is the main program for book recommendations. It should read in the files of book recommendations and then call processdata in ProcessAllBooks.py to get the list of books and authors, and the dictionary of raters and their ratings.

Then it will call the functions in RecommenderForAll to find out what the average rating is for each book, find top raters who are similar to another rater, and give recommendations to a rater.

In particular, it should produce the following output. For all three make sure your output is easily identifiable and understandable.

(See details)



RecommenderMovies.py

This is the main program for movie recommendations. It should read in the file of movie recommendations and then call processdata in ProcessAllMovies.py to get the list of movies and the dictionary of raters and their ratings.

Then it will call the functions in RecommenderForAll to find out what the average rating is for each movie, find top raters who are similar to another rater, and give recommendations to a rater.

In particular, it should produce the following output. For all three make sure your output is easily identifiable and understandable.

(See details)



Next 'details'