CPS 296.1: Topics in Databases Systems
(Spring 2002)
Course Information


Index


Course Description

This course provides in-depth coverage of selected topics in database research. Content differs in each offering. A substantial project is required. This semester, we will cover the following topics: Web and database searches, view maintenance, caching, XML data processing, and data mining. We will dive into the recent research literature, and try to identify and tackle new, open problems.

If you have taken CPS 216, you should be well prepared for CPS 296.1. Otherwise, some prior background working with database systems or in a database-related field (e.g., I/O-efficient algorithms, machine learning, Web) will be helpful. With proper background, it is certainly possible to pick up the necessary materials along the way, since the topics are fairly focused.


Time and Place

TTH 3:50pm-5:05pm, D243 LSRC


Books

Because most of the course materials come from recent research literature, there is no required textbook. Some of the following books may be used for reference:


Staff

Instructor: Jun Yang
Web: http://www.cs.duke.edu/~junyang/
Email: junyang@cs.duke.edu
Office hours: TTH 2:50pm-3:50pm 5:05pm-5:35pm D327 LSRC, or by appointment


Web and CourseInfo

We will use CourseInfo (https://courses.duke.edu/bin/common/course.pl?course_id=_2112_1&frame=top) for recording grades. However, most of the course materials, including the syllabus, lecture notes, reading assignments, etc., will be available only through the course Web page (http://www.cs.duke.edu/courses/spring02/cps296.1/).


Grading

Paper reviews25%
Paper presentation15%
Project60%

You will be expected to read each paper before it is presented in class, and prepare a short review (unless otherwise noted). The review should include a brief summary (1 paragraph), comments and criticisms (2-4 paragraphs). You are also encouraged to ask questions about the paper, or to suggest possible directions for future work related to this paper.

The review is always due on the preceding Sunday (by midnight) of the class meeting in which the paper is scheduled. Please submit each review by email to the instructor with the subject "CPS 296 Review: title," where title is the title of the paper. Please send plain text (i.e., no PostScript or Word files) directly typed in your email (i.e., no attachments). Please use one email for each paper even if multiple reviews are due at the same time. In order to get the full 25% of your grade, you must submit reviews on time for at least 75% of the papers assigned. Late reviews will not be counted.

Each student will present one paper (or a couple of related papers) in one class meeting. A sign-up sheet will be available in the third week of the class.

There is one course project, details of which will be available in the third week of the class.


Honor Code

Under the Duke Honor Code, you are expected to submit your own work in this course, including reviews and projects. On many occasions when working on reviews and projects, it is useful to ask others (the instructor, the TA, or other students) for hints or debugging help, or to talk generally about the papers. Such activity is both acceptable and encouraged, but you must indicate in your submission any assistance you received. Any assistance received that is not given proper citation will be considered a violation of the Honor Code. In any event, you are responsible for understanding and being able to explain on your own anything that you submit. The course staff will pursue aggressively all suspected cases of Honor Code violations, and they will be handled through official University channels.


Tentative Syllabus

WeekDate Topic
12002-01-10 Introduction and review of basic concepts
22002-01-15 Web search: ranking Web pages
2002-01-17 Web search: indexing Web pages
32002-01-22 Web search: crawling the Web
2002-01-24 Integrating Web and database searches: rank aggregation
42002-01-29 Integrating Web and database searches: proximity search and WSQ
2002-01-31 Views: incremental maintenance
52002-02-05 Views: practical incremental maintenance
2002-02-07 Views: self maintenance
62002-02-12 Views: selecting views to materialize
2002-02-14 Views: answering queries using views
72002-02-19 Views: answering queries using views / Datalog primer
2002-02-21 Views: answering queries using views
82002-02-26 Query caching
2002-02-28 Query caching for Web
92002-03-05 Web caching
2002-03-07 Introduction to XML
102002-03-12 Spring recess
2002-03-14 Spring recess
112002-03-19 Introduction to XML
2002-03-21 XML storage
122002-03-26 XML query processing
2002-03-28 XML indexing
132002-04-02 XML publishing
2002-04-04 XML view maintenance
142002-04-09 Association rules
2002-04-11 Association rules
152002-04-16 More data mining
2002-04-18 More data mining
162002-04-23 Reading period
2002-04-25 Reading period
172002-04-30 Reading period
2002-05-02 Final project presentations