This course covers advanced database management system design principles and techniques. The course materials will be drawn from both classic and recent research literature. Possible topics include access methods, query processing and optimization, transaction processing, distributed databases, object-oriented and object-relational databases, data warehousing, data mining, Web and semistructured data, search engines, etc. Programming projects are required. Prerequisites: An introductory database course or consent of instructor.
Don't be too alarmed, however:
Since this is the first offering of CPS 216 in many years, we plan to cover both introductory and advanced materials this semester. Nonetheless, the course will focus more on the internals of a database management system, rather than how to use one. The latter will be the focus of a separate introductory database course, which we plan to introduce next year.
Recommended for reference (either 1 and 2, or 3):
TA: Dazhi Wang
Web: http://www.cs.duke.edu/~wangdz/
Email: wangdz@cs.duke.edu
Office hours: MW 11:30am-12:30pm F 2:00pm-3:00pm D328 LSRC
Please check CourseInfo regularly for important announcements. The CourseInfo discussion board is particularly useful for posting questions that are likely to be of interest to the rest of the class. We very much encourage students in the class to post responses to questions. We will monitor the the board regularly, and post responses to questions that have not previously been asked or answered. Before sending a question, please do make sure that you've read all previous messages and that your question has not yet been discussed.
Homeworks | 30% |
Project | 30% |
Midterm | 20% |
Final | 20% |
There are four homeworks, with a mix of written and programming problems. Late homeworks will not be graded.
There is one course project, details of which will be available by the third week of the class.
You can sign up to present one of the five database research papers (see Tentative Syllabus) in a team of 2 to 4 students. A sign-up sheet will be available in the third week of the class. Participation is voluntary; the reward is that we will drop your lowest homework grade.
Week | Date | Topic | Reference* | Reading** |
1 | 2001-08-28 | Introduction | ||
2001-08-30 | Relational model and algebra | UW 3.1, 4.1 RG 4 |
Codd, "A Relational Model of Data for Large Shared Data Banks," CACM, 13(6), 1970 (in red book) | |
2 | 2001-09-04 | Relational design | UW 3.5-3.9 RG 15 |
|
2001-09-06 | Relational design (cont'd) | same as above | ||
3 | 2001-09-11 | SQL: queries | UW 5.1, 5.2, 5.3-5.6, 5.9 RG 5.1-5.3, 5.4-5.6 |
|
2001-09-13 | SQL: schema, constraints, indexes, views, triggers | UW 5.7, 5.8, 6 RG 5.11-5.13 |
||
4 | 2001-09-18 | Transactions: concurrency control | UW 7.2 RG 18 |
|
2001-09-20 | Transactions: recovery | same as above | ||
5 | 2001-09-25 | SQL: application programming interface | UW 7.1 RG 5.7-5.10 |
|
2001-09-27 | Physical data organization | GMUW 2, 3 RG 7, 8.2 |
||
6 | 2001-10-02 | Indexes | GMUW 4, 5 RG 8, 9, 26.6 |
|
2001-10-04 | More indexes | same as above | Guttman, "R-Trees: A Dynamic Index Structure for Spatial Searching," SIGMOD 1984 (in red book) | |
7 | 2001-10-09 | Student presentation | Hellerstein et al., "Generalized Search Trees for Database Systems," VLDB 1995 (in red book) | |
2001-10-11 | Midterm exam | |||
8 | 2001-10-16 | Fall break | ||
2001-10-18 | Query processing | GMUW 6 RG 12 |
Graefe, "Query Evaluation Techniques for Large Databases," ACM Computing Surveys, 25(2), 1993 (PDF); Sections 1-8 | |
9 | 2001-10-23 | More query processing | same as above | O'Neil & Quass, "Improved Query Performance with Variant Indexes," SIGMOD 1997 (in red book) |
2001-10-25 | Student presentation | GMUW 6.8 RG 7.4 |
Chou & DeWitt, "An Evaluation of Buffer Management Strategies for Relational Database Systems," VLDB 1985 (in red book) | |
10 | 2001-10-30 | Student presentation | Hellerstein et al., "Online Aggregation," SIGMOD 1997 (in red
book) Haas and Hellerstein, "Ripple Join for Online Aggregation," SIGMOD 1999 (PDF) |
|
2001-11-01 | Query optimization: overview and query rewrite | GMUW 7 RG 13, 14 |
Leung et al., "Query Rewrite Optimization Rules in IBM DB2 Universal Database" (in red book) | |
11 | 2001-11-06 | Query optimization: cost estimation | same as above | |
2001-11-08 | Query optimization: plan selection | same as above | Selinger et al., "Access Path Selection in a Relational Database Management System," SIGMOD 1979 (in red book) | |
12 | 2001-11-13 | Student presentation | Avnur & Hellerstein, "Eddies: Continuously Adaptive Query Processing," SIGMOD 2000 (PDF) | |
2001-11-15 | Distributed query processing | RG 21.5-21.9 | Kossmann, "The State of the Art in Distributed Query Processing," ACM Computing Surveys, 32(4), 2000 (PostScript, PDF) | |
13 | 2001-11-20 | Data warehousing | GMUW 11.1-11.4 RG 23 |
Gray et al., "Data Cube: A Relational Aggregation Operator," ICDE 1996 (in red book) |
2001-11-22 | Thanksgiving recess | |||
14 | 2001-11-27 | Data mining | GMUW 11.5 RG24 |
Agrawal & Srikant, "Fast Algorithms for Mining Association Rules," VLDB 1994 (in red book) |
2001-11-29 | Rewiew and catch up | |||
16 | 2001-12-13 | Final exam |
* Not required; recommended for reference, review, and further reading. "UW" refers to the book by Ullman and Widom. "GMUW" refers to the book by Garcia-Molina, Ullman, and Widom. "RG" refers to the book by Ramakrishnan and Gehrke. Section numbers may be different in different editions.
** Required before class. "Red book" refers to the required book by Stonebraker and Hellerstein.