CompSci 516

Data Intensive Computing Systems

Spring 2016

News



Day/Time: Tuesdays and Thursdays, TuTh 3:05PM - 4:20PM
PlaceLSRC D106

Instructor: Sudeepa Roy
  • Email: sudeepa AT cs.duke.edu
  • Office Hour: LSRC D325, Tuesdays 4:30 pm - 5:30 pm and by appointment
TA: Junghoon Kang
  • Email: jungkang AT cs.duke.edu
  • Office Hour: North N303B, Thursdays 10:20 am - 11:20 am
TA: Xiaodan Zhu
  • Email: xdzhu AT cs.duke.edu
  • Office Hour: LSRC D301, Wednesdays 3:20 pm to 4:20 pm

Overview

This is the graduate database course. This course will cover principles and design of database management systems at an advanced level.
   NOTE: There will be changes in this course from the Spring 2015 offering. See below.

Topics will include:
SQL/Relational Algebra/Relational Calculus, Database Normalization, DBMS Architecture/Storage, Indexing/Hashing, Query Algorithms and Optimizations, Transactions and Recovery, Parallel DB/Map Reduce/Distributed query processing, NOSQL/Column store, Datalog, Additional Research Topics (TBD).

Textbooks:
  1. [RG] (Main) Database Management Systems (third edition); Raghu Ramakrishnan and Johannes Gehrke.
  2. [GUW] (Additional) Database Systems: The Complete Book (second edition); Hector Garcia-Molina, Jeffrey Ullman, and Jennifer Widom

Prerequisites:
An introductory database course (CompSci 316 or equivalent) or consent of the instructor. Undergraduate students with the necessary background and interests are welcome.

Grading
  • 5 Homework: 35%
  • Midterm: 25%      
  • Final: 35%
  • Class Participation: 5%
Midterms and finals are closed book and closed notes. No electronic devices are allowed. Each homework will be due in two weeks after it is posted. There are no late days. We will use Sakai for homework submission and Piazza for discussions.

Honor Code:
Under the Duke Honor Code, you are expected to submit your own work in this course in the homework and exams. You are allowed (and are encouraged) to discuss the course material with other students, but need to solve problems in the homeworks and exams on your own. Any assistance received must be clearly indicated in your solutions -- failure to do so will be considered a violation of the Honor Code. In any event, you are responsible for understanding and being able to explain on your own all solutions that you submit. The course staff will pursue aggressively all suspected cases of Honor Code violations, and they will be handled through official University channels.

What is allowed/not allowed

Schedule

(subject to change)

  Day Topic Slides Reading
1 1/14 (Th) Introduction and Data Models Lecture-1 [RG] 1.3, 1.4, 1.5
2 1/19 (T) SQL Lecture-2 [RG] 3, 5 (also see 4.2.4)
[GUW] 6
3 1/21 (Th) Relational Algebra/Calculus Lecture-3 [RG] 4
[GUW] 2.4, 5.1, 5.2
4 1/26 (T) Normalization Lecture-4 [RG] 19.1-19.5, 19.6.1, 19.8 (overview only)
[GUW] 3
5 1/28 (Th) Storage and Indexing Lecture-5 [RG] 8.1-8.3, 8.5
[GUW] 8.3, 14.1
6 2/2 (T) Storage and Indexing Lecture-6 [RG] 9.4-9.7, 10.1-10.7, 11
[GUW] 14.1-14.4
7 2/4 (Th) Indexing and Query Evaluation Lecture-7 [RG] 11, 12
[GUW] 14.3, 15
8 2/9 (T) External Sorting and Operator Algorithms Lecture-8 [RG] 12.2-12.5, 13, 14.1-14.3
[GUW] 15
9 2/11 (Th) Join Algorithms and Query Optimization Lecture-9 [RG] 14.4-14.6, 15
10 2/16 (T) Cost-based Query Optimization Lecture-10 [GUW] Chapter 16.2 - 16.7
(optional) Selinger et al.'s paper [pdf]
11 2/18 (Th) Parallel DBMS and Map-Reduce Lecture-11 [RG] 22.1-22.5
[GUW] 20.1-20.2

(optional reading): the Google MapReduce paper, etc., see the slides
12 2/23 (T) Transactions Lecture-12 [RG] 16.1-16.3, 16.4.1
13 2/25 (Th) Review for midterm and Transactions Lecture-13
Lecture-13-review-RC
[RG] 17.1-17.4
3/1 (T) Midterm (in class)
14 3/3 (Th) MapReduce
(systems aspects)
Lecture-14 Guest Lecturer: Junghoon Kang
(optional reading): the Google MapReduce paper, see Lecture 11
15 3/8 (T) Transactions: Concurrency Control Lecture-15 [RG] 17.5.1, 17.5.3, 17.6
[GUW] 18.8, 18.9
16 3/10 (Th) Transactions: Recovery Lecture-16 [GUW] 17.2.1-17.2.3
3/15 (T) No class - Spring Break
3/17 (Th) No class - Spring Break
17 3/22 (T) Transactions: Recovery Lecture-17 [GUW] 17.2-17.4
18 3/24 (Th) Transactions: Recovery (ARIES) Lecture-18 Main reading:
"Concurrency Control and Recovery" [pdf]
Michael Franklin, 1997
2.2, 3.2

[RG] 18.1-18.6
19 3/29 (T) Distributed Databases Lecture-19 [RG] 18.1-18.6
20 3/31 (Th) Distributed Databases and Spark Distributed DB - see 2PC in Lecture-19 slides.
Spark: Lecture-20
Optional material: Shark
Guest Lecturer for Spark: Prajakta Kalmegh.
21 4/5 (T) NoSQL, Datalog Lecture-21-Datalog.pdf, Lecture-21-NoSQL.pdf, optional
22 4/7 (Th) Data Warehousing and Decision Support Lecture-22
23 4/12 (T) Data Privacy Lecture-23 Guest lecturer: Xi He
24 4/14 (Th) Lecture cancelled - make up lecture on 04/18, Monday
24 4/18 (M) View Selection, Provenance, Probabilistic Databases, Crowd-sourcing Lecture-24 In LSRC A247, 4:30 pm - 5:45 pm
25 4/19 (T) Data Mining
(last class)
Lecture-25 (optional) Agrawal-Srikant, "Fast Algorithms for Mining Association Rules [pdf]
4/29 (F) Review Notes on Sakai In LSRC D106, 4:00 pm - 5:30 pm
5/7 (Sat) Final Exam (2:00 PM - 5:00 PM)


Homework

Homework Topic Posted on Due on
HW1 (see Sakai) SQL and Postgres (input in XML) Part-1 : 01/20 (Wed), Part-2 : 01/26 (Tues) 02/10 (Wed), 11:59 pm
HW2 (see Sakai) RA, RC, Normalization, Indexing, Hashing, External Sorting, Join Algorithms 02/13 (Sat) 02/29 (Mon), 11:55 pm
HW3 (see Sakai) Spark on Local Machine 03/02 (Wed) 03/23 (Wed), 11:55 pm
HW4 (see Sakai) Spark on AWS 03/23 (Wed) 04/06 (Wed), 11:55 pm
HW5 MongoDB 04/06 (Wed) 04/20 (Wed), 11:55 pm