Introduction to Computer Science
CompSci 101 : Spring 2014

Movie Data

IMDB is an amazing resource for anyone interested in movies, so let's look at some basic data from there and ask some questions about the top rated and top grossing movies. In this example, there are three separate data files: the top 250 movies as rated by IMDB users, the top 250 grossing movies, and the directors and top 5 actors in each of those movies. Each file contains a header row describing the fields and information about each movie on a single line separated by tabs.

Start by snarfing the code from the class website. Alternatively, you can browse code here. This given code was developed during class and reads in the given data, combines it into a single dictionary whose keys are a tuple of the movie's title and year (since many titles have been reused) and whose values are a list of the following strings:

[order, director, actor1, actor2, actor3, actor4, actor5, ranking, value(s) ]

where value(s) is either its average rating or its gross profits. Additionally, some movies may have ranking and value data for both rating and gross profits if they appear in both lists. In other words, the length of the list may be either 9 or 11 elements and its last element may be either a rating (a value less than 10) or a monetary value (a value greater than 100 million).

Understanding the Code

In this part, examine the given functions.

  1. Discuss the function, bothTopRatedAndGrossing, that returns a dictionary that contains only the movies that are both top grossing and top rated. The dictionary's format is the same as that of the combined dictionary, movies.
  2. Discuss the function, uniqueDirectors, that returns a list of strings of the names of the directors of all the movies in the given dictionary. The resulting list is in sorted order (by first name) and does not contain duplicates.
  3. Why can you use the same function, uniqueDirectors, to answer both of these questions just by sending it different dictionaries:
    1. Who directed the movies that are either top rated or top grossing?
    2. Who directed the movies that are both top rated and top grossing?

Writing New Code

Write the following four functions and use them to answer the seven questions given at the bottom of the imdbmovies module by passing them different parameters or combining their results. In writing the following functions you are welcome to use any of the values that have been returned by other functions in the module (as in the uniqueDirectors example above).

  1. Write a function, directorsOfMostMovies, that returns a list of tuples of an int and a string, (number of movies directed, name of director), for the directors of all the movies in the given dictionary. The returned list should be in sorted order, by number from highest to lowest, and should be limited to only the top N most prolific directors as given by the parameter count.
  2. Write a function, castFilmography, that returns a list of tuples of a string and a list of tuples of strings, (name of actor/actress, [ (title, year) ]), for cast members of all the movies in the given dictionary. The list's elements should be the title and year of each movie in which the cast member acted. The returned list should be in sorted order, alphabetically by cast member's first name, and should be limited to only those that acted in at least N movies as determined by the parameter minAppearances.
  3. Write a function, uniqueCastMembers, that returns a list of strings, name of actor/actress, for all unique cast members of all the movies in the given dictionary. The returned list should be in sorted order, alphabetically by cast member's first name.
  4. Write a function, mostHighlyRatedCastMembers, that returns a list of tuples of an float and a string, (average rating of movies acted in, name of actor/actress), for cast members in all the movies in the given dictionary. The float, average rating, should be the average of the ratings of all movies in which the cast member appeared. The returned list should be in sorted order, by number from highest to lowest, and should be limited to limited to only the top N most highly rated cast members as given by the parameter count that acted in at least minAppearances movies.