CPS 296.3 (Spring 2009):
Information Management and Mining

Course Information   Lecture Notes   Readings   Tentative Schedule   Resources


Course Description

We are witnessing an explosive growth in the amount of data generated by scientific research, businesses, governments, social networks, etc. This course examines the techniques for distilling useful information from this massive flood of data. Topics include information retrieval, web search, data mining, as well as I/O-efficient, parallel, and distributed realization of large-scale data analysis tasks. Besides classic textbook materials on these topics, the course will also examine recent research developments, such as streaming data, social network analysis, and data-centric parallel programming languages (e.g., Google's Sawzall and Yahoo!'s Pig).

The course is designed to not overlap with CPS 216 in content.

Prerequisites: A good understanding of algorithms, data structures, and programming. No background in databases is assumed.

In addition to lectures, the course will have some seminar-style class meetings. Students will read recent research papers, and give presentations and lead discussions of these papers. There also will be an open-ended course project.

There will not be any exams or homework (except ungraded reading and presentation assignments). The grade will be based on class participation and course project only.


Instructor: Jun Yang
Email domain: cs.duke.edu, user: junyang (address is user@domain)
Office: D327 LSRC
Office hours: Mondays 3-4pm and Tuesdays 4-5pm, or by appointment

Time and Place

1:30pm-4pm on Tuesdays; North 306

The above meeting schedule began 01/14. On 01/13 the class met 2:50pm-4:05pm. No class meeting on 01/15.


No textbook is required. There will be a reading list drawn from recent research literature. The list will be posted and updated regularly on the course Web site.

Web and Email

Most of the course materials, including the tentative schedule, lecture notes, reading list, etc., will be available through the course Web page (http://www.cs.duke.edu/courses/spring09/cps296.3/).

The email address cps296.3@cs.duke.edu reaches everybody in the class as well as the instructor. Only announcements, questions/answers, and comments of general interests should be sent to this address. Specific questions should be directed to the instructor. Please check your emails regularly, as important announcements and information will be sent via email.


Grading is done on an absolute scale (in other words, there is no curve). Anyone earning 90% or more of the total number of points available will receive a grade in the A range; 80% or more guarantees a grade in the B range; 70% or more guarantees a grade in the C range; 60% or more guarantees a grade in the D range.

  • Reading, discussion, and participation (50%): There will be reading assignments throughout the semester, posted in the Readings section of the course Web site as the course progresses. Some of the reading assignments require short reviews, which together constitute 20% of the grade. Each student will also be expected to present and lead the discussion in two to three class meetings, which account for 20% of the grade. Class attendance accounts for the remaining 10%.
  • Course project (50%): There will be a course project, which can be done either individually or in teams. The project has three milestones: a short project proposal presentation (right after the spring recess), which accounts for 15% of the grade; a short project progress report (in the first week of April), which acounts for another 5% of the grade; and a final project presentation (during the finals week). The final project presentation and the overall quality of the project account for 30% of the grade.

Honor Code

Under the Duke Honor Code, you are expected to submit your own work in this course. On many occasions, it is useful to ask others for hints or help, or to search the Web for related resources (e.g., slides from the original authors of a paper you are presenting). Such activities are acceptable, but you must explicitly indicate any assistance you received. Any assistance received that is not given proper citation will be considered a violation of the Honor Code. In any event, you are responsible for understanding and being able to explain on your own all materials that you submit and present. The course staff will pursue aggressively all suspected cases of Honor Code violations, and they will be handled through official University channels.

Last updated Tue Jan 13 21:29:49 EST 2009