CPS 216: Advanced Databases Systems
Course Information


Index


Course Description

Newly minted course description:

This course covers advanced database management system design principles and techniques. The course materials will be drawn from both classic and recent research literature. Possible topics include access methods, query processing and optimization, transaction processing, distributed databases, object-oriented and object-relational databases, data warehousing, data mining, Web and semistructured data, search engines, etc. Programming projects are required. Prerequisites: An introductory database course or consent of instructor.

Don't be too alarmed, however:

Since this is the first offering of CPS 216 in many years, we plan to cover both introductory and advanced materials this semester. Nonetheless, the course will focus more on the internals of a database management system, rather than how to use one. The latter will be the focus of a separate introductory database course, which we plan to introduce next year.


Time and Place

TTH 2:15pm-3:30pm, D106 LSRC


Books

Required:

Recommended for reference (either 1 and 2, or 3):

  1. A First Course in Database Systems, 1st Edition, by Jeffrey D. Ullman and Jennifer Widom.
  2. Database System Implementation, 1st Edition, by Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom.
  3. Database Management Systems, 2nd Edition, by Raghu Ramakrishnan and Johannes Gehrke.


Staff

Instructor: Jun Yang
Web: http://www.cs.duke.edu/~junyang/
Email: junyang@cs.duke.edu
Office hours: TTH 1:15pm-2:15pm 3:30pm-4:00pm D327 LSRC

TA: Dazhi Wang
Web: http://www.cs.duke.edu/~wangdz/
Email: wangdz@cs.duke.edu
Office hours: MW 11:30am-12:30pm F 2:00pm-3:00pm D328 LSRC


Web and CourseInfo

We will use CourseInfo (https://courses.duke.edu/courses/COMPSCI216.01-F2001) for announcements, discussions, and grades. However, most of the course materials, including the syllabus, lecture notes, reading assignments, homeworks, etc., will be available only through the course Web page (http://www.cs.duke.edu/courses/fall01/cps216/).

Please check CourseInfo regularly for important announcements. The CourseInfo discussion board is particularly useful for posting questions that are likely to be of interest to the rest of the class. We very much encourage students in the class to post responses to questions. We will monitor the the board regularly, and post responses to questions that have not previously been asked or answered. Before sending a question, please do make sure that you've read all previous messages and that your question has not yet been discussed.


Grading

Homeworks30%
Project30%
Midterm20%
Final20%

There are four homeworks, with a mix of written and programming problems. Late homeworks will not be graded.

There is one course project, details of which will be available by the third week of the class.

You can sign up to present one of the five database research papers (see Tentative Syllabus) in a team of 2 to 4 students. A sign-up sheet will be available in the third week of the class. Participation is voluntary; the reward is that we will drop your lowest homework grade.


Honor Code

Under the Duke Honor Code, you are expected to submit your own work in this course, including homeworks, projects, and exams. On many occasions when working on homeworks and projects, it is useful to ask others (the instructor, the TA, or other students) for hints or debugging help, or to talk generally about the written problems or programming strategies. Such activity is both acceptable and encouraged, but you must indicate in your submission any assistance you received. Any assistance received that is not given proper citation will be considered a violation of the Honor Code. In any event, you are responsible for understanding and being able to explain on your own all written and programming solutions that you submit. The course staff will pursue aggressively all suspected cases of Honor Code violations, and they will be handled through official University channels.


Tentative Syllabus

WeekDate TopicReference*Reading**
12001-08-28 Introduction
2001-08-30 Relational model and algebra UW 3.1, 4.1
RG 4
Codd, "A Relational Model of Data for Large Shared Data Banks," CACM, 13(6), 1970 (in red book)
22001-09-04 Relational design UW 3.5-3.9
RG 15
2001-09-06 Relational design (cont'd) same as above
32001-09-11 SQL: queries UW 5.1, 5.2, 5.3-5.6, 5.9
RG 5.1-5.3, 5.4-5.6
2001-09-13 SQL: schema, constraints, indexes, views, triggers UW 5.7, 5.8, 6
RG 5.11-5.13
42001-09-18 Transactions: concurrency control UW 7.2
RG 18
2001-09-20 Transactions: recovery same as above
52001-09-25 SQL: application programming interface UW 7.1
RG 5.7-5.10
2001-09-27 Physical data organization GMUW 2, 3
RG 7, 8.2
62001-10-02 Indexes GMUW 4, 5
RG 8, 9, 26.6
2001-10-04 More indexes same as above Guttman, "R-Trees: A Dynamic Index Structure for Spatial Searching," SIGMOD 1984 (in red book)
72001-10-09 Student presentation Hellerstein et al., "Generalized Search Trees for Database Systems," VLDB 1995 (in red book)
2001-10-11 Midterm exam
82001-10-16 Fall break
2001-10-18 Query processing GMUW 6
RG 12
Graefe, "Query Evaluation Techniques for Large Databases," ACM Computing Surveys, 25(2), 1993 (PDF); Sections 1-8
92001-10-23 More query processing same as above O'Neil & Quass, "Improved Query Performance with Variant Indexes," SIGMOD 1997 (in red book)
2001-10-25 Student presentation GMUW 6.8
RG 7.4
Chou & DeWitt, "An Evaluation of Buffer Management Strategies for Relational Database Systems," VLDB 1985 (in red book)
102001-10-30 Student presentation Hellerstein et al., "Online Aggregation," SIGMOD 1997 (in red book)
Haas and Hellerstein, "Ripple Join for Online Aggregation," SIGMOD 1999 (PDF)
2001-11-01 Query optimization: overview and query rewrite GMUW 7
RG 13, 14
Leung et al., "Query Rewrite Optimization Rules in IBM DB2 Universal Database" (in red book)
112001-11-06 Query optimization: cost estimation same as above
2001-11-08 Query optimization: plan selection same as above Selinger et al., "Access Path Selection in a Relational Database Management System," SIGMOD 1979 (in red book)
122001-11-13 Student presentation Avnur & Hellerstein, "Eddies: Continuously Adaptive Query Processing," SIGMOD 2000 (PDF)
2001-11-15 Distributed query processing RG 21.5-21.9 Kossmann, "The State of the Art in Distributed Query Processing," ACM Computing Surveys, 32(4), 2000 (PostScript, PDF)
132001-11-20 Data warehousing GMUW 11.1-11.4
RG 23
Gray et al., "Data Cube: A Relational Aggregation Operator," ICDE 1996 (in red book)
2001-11-22 Thanksgiving recess
142001-11-27 Data mining GMUW 11.5
RG24
Agrawal & Srikant, "Fast Algorithms for Mining Association Rules," VLDB 1994 (in red book)
2001-11-29 Rewiew and catch up
162001-12-13 Final exam

* Not required; recommended for reference, review, and further reading. "UW" refers to the book by Ullman and Widom. "GMUW" refers to the book by Garcia-Molina, Ullman, and Widom. "RG" refers to the book by Ramakrishnan and Gehrke. Section numbers may be different in different editions.

** Required before class. "Red book" refers to the required book by Stonebraker and Hellerstein.