Extra Credit: Audio Scrobbler

In this assignment, you will investigate the AudioScrobbler system used to power the last.fm site. You should have already signed up for AudioScrobbler from an earlier assignment. Make sure that you have signed up for the CompSci 1 group and listened to a lot of music on your iPod.

Check your profile page to see if your tracks are displayed. It sometimes takes a few days for the tracks to show up. See the help pages for more information.

Recommender Systems

Due: Wednesday, April 26 (max 10 points) We will post everyone's top artist and track lists as well as their neighbors. AudioScrobbler has a number of services set up so that you can find people with similar musical interests, listen to your favorite music, and discover new music that you should like.

Most of these services are based around the concept of a neighbor. Your neighbors are supposed to be people with similar music taste to you. How are neighbors calculated? Here's what they have to say?

We have developed an especially perverted type of probabilistic latent semantic analysis. Profiles are decomposed using a custom algorithm based on relative popularity of items, then organised using latent class analysis.

The authors appear to be being deliberately vague here, but there is a great deal of work on such systems. Latent semantic analysis is often used in collaborative filtering systems. Collaborative filtering systems make predictions about the interests of a user by generalizing from taste information collected by the collective user community. AudioScrobbler is a type of recommender system that collects data on user behavior and uses collaborative filtering to recommend other songs. The details of latent semantic analysis may be beyond the scope of this course, but the general ideas are still somewhat accessible.

There are two good survey papers on Blackboard:

Paul Resnick and Hal R. Varian. Recommender Systems (Introduction to special section). Communications of the ACM, 40(3):56-58, March 1997.
Adomavicius, G. and Tuzhilin, A. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6):734-749, 2005.

Using the information from the papers, the data on the last.fm profiles, and the help pages and forums on the last.fm site, answer as many of the following questions as you can as to the best of your ability.

How similar are your musical tastes to those expressed by your top neighbor? Are many of the songs listed in his or her profile in your collection as well? What percentage of the Top Tracks - Overall listed in his or her profile do you own or listen to frequently? What about Top Artist - Overall?
If you go to your neighbors page and click Expand Info, you can see the Match Value for each of your neighbors. What is the value of your closest neighbor? Graph the match values for your other neighbors. Does the match value decrease gradually or precipitously? Why?
Based on the observed neighbor values and the readings on recommender systems, how do you think the match values are calculated? What information is used from the playlist? What information is used from the actual music files? I am not interested in the absolute match values, but rather the relative values. For example, in what kind of system, would profile A be a closer match than profile B and vice versa.
Design an experiment to test your hypothesis from the previous question.
Who would be your neighbors from the CompSci 1 group? How could you create a graph of the connections between users? What would the vertices and edges represent?