Introduction to Computer Science
CompSci 101 : Fall 2013

Movie Data

IMDB is an amazing resource for anyone interested in movies, so let's look at some basic data from there and ask some questions about the top rated and top grossing movies. In this example, there are three separate data files: the top 250 movies as rated by IMDB users, the top 250 grossing movies, and the directors and top 5 actors in each of those movies. These files are formatted as CSV files.

Start by snarfing the code from the class website. Alternatively, you can browse code here. This given code was developed during class and reads in the given data using Python csv module, combines it into a single dictionary whose keys are the movie's title and year (since many titles have been reused) and whose values are a list of the following strings:

[director, actor1, actor2, actor3, actor4, actor5, ranking, value(s) ]

where value(s) is either its average rating or its gross profits. Additionally, some movies may have ranking and value data for both rating and gross profits if they appear in both lists. In other words, the length of the list may be either 8 or 10 elements and its last element may be either a rating or a monetary value.

Understanding the Code

In this part, examine the two given functions, bothTopRatedAndGrossing and uniqueDirectors, that answer three of the questions given at the end of the module. In addition to seeking to understand how they use the dictionary to answer the questions, look at how their parameters are used to change the question being asked even though only one function is used.

  1. The function bothTopRatedAndGrossing returns a dictionary containing only the movies that are both top grossing and top rated. The dictionary's format is the same as that of the combined dictionary, movies. In this way, it can be used as a parameter to other functions that are expecting movie data.
  2. The function uniqueDirectors returns a list of strings, the names of the directors, of all the movies in the dictionary passed as a parameter. The resulting list is in sorted order (by first name) and does not contain duplicates.
  3. Thus this one function can answer both of the following questions because it receives different dictionaries when called:

    1. Who directed the movies that are either top rated or top grossing?
    2. Who directed the movies that are both top rated and top grossing?

    This flexibility is something you should model in the functions you write below.

Writing new Code

When writing the following functions you are welcome to use any of the values that have been returned by other functions in the module (as in the uniqueDirectors example above). Write the following functions to answer the questions printed at the end of the imdbmovies module, sometimes just by varying the parameters passed to the function (i.e., you are writing 5 functions to answer 7 questions):

  1. Write a function that returns a list of tuples of an int and a string, (number of movies directed, name of director), for the directors of all the movies in the given dictionary. The returned list should be in sorted order, by number from highest to lowest, and should be limited to only the top N directors as given by the parameter count.
  2. Write a function that returns a list of tuples of an int and a string, (total money grossed, name of director), for the directors of the top grossing movies. The int, total money grossed, should be a sum of the gross profits of all movies directed by that director. The returned list should be in sorted order, by number from highest to lowest, and should be limited to only the top N directors as determined by the parameter count.
  3. Write a function that returns a list of tuples of a string and a list of tuples of strings, (name of actor/actress, [ (title, year) ]), for cast members in movies that are either top rated or top grossing. The list's elements should be the title and year of each movie in which the cast member acted. The returned list should be in sorted order, alphabetically by cast member's first name, and should be limited to only those that acted in at least N movies as determined by the parameter minAppearances.
  4. Write a function that returns a list of tuples of an int and a string, (number of movies acted in, name of actor/actress), for cast members in all the movies in the given dictionary. The returned list should be in sorted order, by number from highest to lowest, and should be limited to limited to only the top N cast members as given by the parameter count.
  5. Write a function that returns a list of tuples of an float and a string, (average rating of movies acted in, name of actor/actress), for cast members in all the movies in the given dictionary. The float, average rating, should be the average of the ratings of all movies in which the cast member appeared. The returned list should be in sorted order, by number from highest to lowest, and should be limited to limited to only the top N cast members as given by the parameter count.

Bonus

Write one or two functions of your choice that "answers questions" using multiple columns in the given data, such as: find the actor/director pairs that have worked on the most movies, rank the decades by their movie grosses or ratings, or find the cast member that has appeared in movies in the most decades.

Output

In addition to the current program's output, use the function printData to print your results after calling the function you have written to answer each question. For the last problem, you should print the "question" you are asking on one (or more) line(s) and then the answer on the next line(s).

Submission

As an assignment, this should be done individually and submitted electronically from within Eclipse or on the web using the name lab09_movies. Please include a README file with your submission.