u CompSci 316 - Introduction to Databases -Fall 2022

CompSci 316

Introduction to Databases

Fall 2022

Home
Updates
Time/Day
Staff
Grading
Workload
Resources / Toolkits
Course Policy
Help
Schedule
Have questions? Please email both Sudeepa (sudeepa at cs.duke.edu) and Alex (alex.chao at duke.edu) for questions on logistics.
All other questions should be discussed on the Ed discussion forum.

(This website is work in progress).

Overview

We intend this course to give you a solid background in database systems as well as managing and processing "big data" in general. Topics include data modeling, database design theory, data definition and manipulation languages (SQL, relational algebra, and NoSQL), database application programming interfaces, storage and indexing, query processing and optimization, parallel and distributed data processing, transaction processing, as well as a sample of other advanced topics.

Prerequisites:

CompSci 201 AND one of CompSci 210/250, or, consent of the instructor.

Updates for Fall'22:

  • Welcome to CompSci 316, Fall 2022! The first class will be on Tuesday August 30. Please check this website and posts on Ed frequently and carefully for all logistics information and announcements. We hope you would have fun in learning database systems in this class!

  • Please note that everyone is required to wear a mask in the classroom. Please check out Duke's COVID protocol frequently.

  • Attendance in discussion session is waived for students who have COVID and need to quarantine. Please see the details below under discussion sessions.


Time/Day

      Lectures: Tuesdays and Thursdays, 1:45 pm - 3:00 pm, Bryan Center Griffith Theater

      Discussion-01: Fridays, 10:15 am to 11:30 am, LSRC B101
      Discussion-02: Fridays, 1:45 pm to 3:00 pm, FFSC 2231

       (attendance in one of the discussion sessions is required, see grading below)

    
     Office Hours: See below.


Course Staff

Instructor

      Sudeepa Roy

Teaching Associate

      Alex Chao

Grad TAs

       Yuxi Liu, Zhe Wang, Haibo Xiu, and (half-TAs) Tong Lin and Fiona (Chengyu) Wu

UTAs

       Konstantinos Bailas, Samy Boutouis, Neel Gajjar, Joshua Guo, Alexandra Lawrence, Joon Young Lee, Justin Lim, Danny Luo, Alok Malhotra, Harris Masterson, Jason Qiu, Alex Schiff, Grace Tian, Joyce Wang Samia Zaman, Han Zhang, Zachary Zheng.

Not sure who your TAs are? Check out first few slides of Lecture 1

Weekly Office Hour Calendar:

See Ed for the calendar.

Grading

You have a considerable amount of control over the final grade you would receive in this class if you work hard, and should have a fair idea where you approximately stand at any point of time!
  • Grading is done on an absolute, but adjustable scale. Anyone earning
    • 90% or more of the total number of points available will receive a grade in the A range (A+, A, A-);
    • 80% or more guarantees a grade in the B range (B+, B, B-);
    • 70% or more guarantees a grade in the C range (C+, C, C-);
    • 60% or more guarantees a D (no +/- in this range).

  • At the discretion of the instructor, the grading scale may slide down (i.e., grades go higher), but it will not slide up.

  • Assignment of +/- in the letter grades will be decided by the instructor based on the performance of the entire class. The highest overall score in the class, and only the exceptional performances will receive the A+ grade.


Weights of each component:

See details in Workload below.

Component Weight
Homeworks 34%
           Written problem solving /programming 25%
           Gradiance exercises (2 lowest score dropped) 9%
Exams 35%
           Midterm 17%
           Final exam 18%
Project 24%
Class participation 7%
           Attending discussion sessions (quizzes and collaboration) 5%
           Communication 2%
Extra Credit (above 100%) 2%
           Extra credit problems from HW and Exams (lowest 1/4th scores dropped) 2%


Workload

  • Homeworks (34%): There will be weekly homeworks (due in 7-10 days). They will be based on the last 1-2 lectures. They are of two types:

    1. Written problem solving and programming assignments (25%):
      Start early and allocate enough time to solve these problems!

      • Late policy: Homeworks will be due at 10 pm. There will be a one hour grace period if any submission is a few minutes late for some technical reasons when you will not lose any points. After that, homework problems that are submitted late will receive an automatic deduction of 5% per hour late (per problem as marked by Gradescope, not per sub-problem, so even if one sub-problem is late, the problem is late). You will receive no credit after the sample solution becomes available.

        Note that the deadline is 10 pm, not 11 pm, so submitting after 10 pm will be marked as late by Gradescope, but you won't lose points until 11:00 pm. Submission marked at 11:01 will lose 5%. Q/A on Ed may be slow and uncertain at night and may not be available after 10 pm, and there may be unexpected technical issues, so please start and submit early. Unless gradescope crashes and does not accept homeworks by 10 pm (then remember to take a screenshot to inform us on Ed), no other types of technical issues or lack of help from the course staff at the last minute will be considered as a valid excuse if you do not submit by the deadline. You might find it useful to save your solutions to an online drive like Duke Box (meant for automated synchronization) to be able to access them from any machine.

        Exceptions will only be made in the case of documented excuses; follow the standard university procedure for filing them - in other words, you must submit an Incapacitation Form (STINF), Religious Observance Notification Form, or Notification of Varsity Athletic Participation Form before the deadline of an assignment -- these forms will give you two days (48 hours) of extensions, but the standard late penalty won't apply after that extension, i.e., your homework won't be accepted after 48 hours. For any other reason, if you submit a form after the hw deadline, or if you need an extension beyond two days, you must arrange for your academic dean to email the instructor regarding your circumstances (Dean's Excuse). You must have an email from Sudeepa granting the extension - otherwise the standard late penalty would apply. Also note that help from the course staff may be slow or uncertain in the weekend except office hour timesDo not rely on getting an extension and start working on homeworks/quizzes right after they are released, try to finish homeworks as early as possible, and reach out to us early on Ed or in office hours so that you can get enough help.

    2. Gradiance exercises (9%):
      Gradiance is an online service pioneered by one of the authors of the textbook, Prof. Jeffrey Ullman at Stanford. One of the best features of Gradiance is that you are permitted to test yourself on a particular topic as many times as you like. You receive immediate feedback for each attempt, which avoids the shortcoming of the traditional submit-and-then-wait-for-grades assignments where one error in understanding can permeate solutions to multiple problems and does not get rectified until much later. We encourage you to continue testing on each topic until you complete the part of the assignment with a 100% score. The highest score will be recorded. The questions will be the same in every attempt, but the answer choices will be selected at random. We will drop the lowest two scores at the end.

      • Late policy: Gradiance exercises will also be due at 10 pm. There are no late days or hours for gradiance assignments (under all circumstances). It will automatically close after the deadline. The website might have occasional downtime for maintenance. Make sure to start early and finish them by the deadline.



  • Project (24%): The course projects are to be done in groups of five members. All project members must be chosen from the same discussion section. More details would be posted later.

  • Midterm (17%) and final (18%): Both midterm and final exams are open-book and open-notes. Final is comprehensive but may focus on materials not already covered by the midterm. There won't be any make up exam. There won't be any make up or late exams. If you miss the midterm, if you have a documented excuse as mentioned for the homework assignments above, your midterm score will be replaced by the score you receive in the final. Note that the final may be easier or harder than the midterm. The final exam is required, to get a valid grade in this class, you must take the final exam as scheduled by the university.

  • Class Participation (7%):
    1. Attendance and assignments in discussion sessions (5%): Each discussion session may have points for attendance, and for solving quizzes and practice problems. All discussion sessions have the same weight. We will drop the lowest three scores for discussion sessions to account for the days when you cannot attend it.

      Attendance in discussion session is only excused (i.e., the attendance will be given to the student while that student is not present in the discussion session) if a student gets COVID during the semester for everyone's safety. In that case, the student should write an email to Sudeepa, Alex, and the student's Dean notifying them that the student has COVID. If the student is required to miss the discussion session in the week after (i.e., will miss two discussion sessions in a row) as advised by the student health or doctors, please again notify Sudeepa, Alex, and the student's Dean by replying to the same message.

    2. Communication (2%): We will be regularly contacting you about some information we might need, or your progress, feedback, concerns, etc. These points are reserved for your response within the time limit provided in the emails. All are required and all have the same weights.

  • Extra credit problems (2% on top of 100%): There might be 0-2 extra credit problems with each assignment and exam, each with equal weight, which you can choose to solve to get up to 2% extra credit above your grade. We will drop (ceiling of) 25% lowest scores, e.g., if there are 5 extra credit problems in the entire course, we will drop 2 lowest scores, and if there are 10 extra credit problems, we will drop 3 lowest scores.

Note: Almost all types of assignments have options for late days or removal of some lowest scores. To ensure fairness to everyone, no additional late days would be granted by email requests before the deadline. Please start early!

How much time should I allocate to get work done in CompSci 316?

You should plan for the following every week. You should approximately allocate 8-12 hours every week for the class, although it may vary from student to student.

Activity Hours/Week Days of Weeks
In-person lectures 1.25 x 2 = 2.5 Tues, Thurs
Discussion session 1.25 Fri
Assignments 2-5 (varies) varies, typically not Friday or weekend!
Project 2-4 (heavier work likely later in the course,
if you do an "open project", expect 1.5x or so more work (and more fun!))
Mon (weekly updates)
+
Two milestones and a final project submission


Resources / Communication / Toolkits

Book: If you would like to consult a textbook, we primarily use the following book:

Database Systems: The Complete Book, by Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom. 2nd Edition. Prentice Hall. 2008.

See the publisher's book page and the Amazon book page. Relevant chapters for reading are posted under Schedule. In a typical semester, textbooks for this course are available for 3-hour checkouts at the Duke Libraries. Search the Libraries' Top Textbooks program here: https://library.duke.edu/course-support/course-reserves/textbooks. Please consult the library for options during the COVID-19 situation.

Gradescope: We will use Gradescope (for submission and grading of (non-Gradiance) homeworks and project work, as well as grading of exams.

Communication and Ed: You should check Ed regularly for important course-related announcements. Important announcements will be also sent through Sakai and in class lectures.

All questions that may be of general interest to the class should be directed to Ed. You will get your questions answered faster on Ed than via personal emails to the course staff (who will direct you to Ed), because Ed is monitored closely by everybody in the class, not just the course staff. You are highly encouraged to answer each others' questions on Ed and the course staff would endorse/add to those answers. Only for logistics-related issues, email both Sudeepa and Alex.

Sakai: We will use the Sakai course management system for posting sample solutions (under "Resources") and for checking grades (under "Gradebook").

Computing: You will need access to a computer (any major OS will do) on which you are allowed to install new software. We will also use cloud-based virtual machines - see Help for details.

Course Policy

Standards of Conduct: Under the Duke Community Standard, you are expected to submit your own work in this course, including homeworks, projects, and exams.
  • On many occasions when working on homeworks and projects, it is useful to ask others (the instructor, the TA, or other students) for hints or debugging help, or to talk generally about the written problems or programming strategies. Such activity is both acceptable and encouraged, but you must indicate in your submission any assistance you received (including help from the course staff). Any assistance received that is not given proper citation will be considered a violation of the Standard.

  • In any event, you are responsible for writing, understanding, and being able to explain on your own all written and programming solutions that you submit.

  • Copying solutions to any problem in any assignment from other students in the class, even if you have discussed those problems with them, is strictly prohibited.

  • In particular, it should not be the case that a group of students is working together to come up with a single solution. Everyone should try to solve the problems on their own, and then you can discuss with TAs and other students for debugging. If you are completely stuck with a problem, we strongly encourage to go to TA office hours than asking other students for hints to avoid wrong solutions.

  • It is strictly not allowed to seek help outside your TAs and classmates for solving the assignments, so you CANNOT search for answers on the Web, ask students from previous semester taking this course or anyone else for help and material, or search for solutions from previous semesters.

  • You can use online tutorial and resources for your project, but the entire code must be written by your team members. Please acknowledge all websites that you have consulted in your project milestone reports.

  • Exam policy: Exams are open book, open notes, no collaboration or electronic devices are allowed, and comprehensive (until the lecture before the exam). More information will be announced before the exams. Also see above about missing exams.

  • The course staff will pursue aggressively all suspected cases of violations, and they will be handled through official University channels. Any proven violation of course policy would result in a zero in the entire assignment (not just the problem with violated policy) and may result in strict disciplinary actions.

  • If you are unsure of a policy, please ask Sudeepa or Alex and do not assume anything.


Help

The help section will be updated when the class starts.

Schedule

(subject to change)

"Notes" will be uploaded before the class and are intentionally left incomplete for interactive lectures. Completed "slides" will be uploaded after the lectures. Chapters for optional reading will be updated after the lectures.

D = Discussion session

A = Assignments

-->
  Day Topic                 Slides            Assignments / Remarks Optional Reading
1 8/30 (T) Introduction Lecture-1

2.1, 2.2, 6.1, 6.2
2 9/1 (Th) Relational model,
Basic SQL,
and Relational Algebra (RA)
Lecture-2

(up to slide 34)
2.3, 2.4
D1 9/2 (F) VM Setup and basic SQL + RA Discussion-1-sol

(up to 2nd RA)
A None due
3 9/6 (T) RA contd.
+ Database design in E/R model

Guest Lecture by Dr. Amir Gilad
Lecture-3

4.1-4.4
4 9/8 (Th) Database design: E/R-relational translation

Guest Lecture by Prof. Jun Yang
Lecture-4

4.5-4.6
D2 9/9 (F) More RA; Part of HW-1 (RA) solving Discussion-2

A Gradiance-1 due on 9/14 Wednesday 10 pm (no extension/late days)
HW-1 (RA) due on 9/15 Thursday 10 pm (see late policy above)
MS1 - Names of members for each team due on 9/16 Friday 5 pm, gradescope group assignment
5 9/13 (T) SQL: aggregation, subqueries, NULL, outerjoin, modifications, constraints, triggers, views Lecture-5

(up to slide 26)
2.3, 6.1.1-6.1.7, 6.2-6.5, 7.1-7.5, 8.1-8.3
6 9/15 (Th) contd. (up to slide 55)
D3 9/16 (F) ERD & iREX tool for SQL Discussion-3

A Gradiance-2 (ERD) due on 9/21 Wednesday 10 pm
HW-2 (ERD) due on 9/22 Thursday 10 pm
7 9/20 (T) Project mixer : guest lecture by Danai Adkisson (OIT) on project setup Slides from CoLab
8 9/22 (Th) SQL contd.
D4 9/23 (F) Part of SQL HW-3 solving with iRex Discussion-4

A Gradiance-3 (SQL & NULL) due on 9/28 Wednesday 10 pm
HW-3 (SQL) due on 9/29 Thursday 10 pm
9 9/27 (T) Database design theory: FD, BCNF Lecture-6

(up to slide 19)
3.1-3.4, 3.6, 3.7
10 9/29 (Th) SQL recursion
D5 9/30 (F) Midterm practice problems Midterm Review

A NO GRADIANCE DUE
NO HOMEWORK DUE
11 10/4 (T) Midterm in class
(syllabus: everything covered until and including 9/29)
9.1, 9.3, 9.4, 9.6, 10.2
12 10/6 (Th) SQL programming

Storage & Index
Lecture-5a

Lecture-7a (storage)
(up to slide 11 - to be contd.)

Lecture-7b (index)
(up to slide 12)
D6 10/7 (F) Git Tutorial and Project Work Discussion 6

A Project MS-2 due on 10/13 Thursday 10 pm
10/11 (T) Fall break - no class
13 10/13 (Th) Index contd. (Lec 7b up to slide 25) 14.1, 14.2
D7 10/14 (F) Project & HW4 Discussion 7

A Mid-semester feedback survey due on 10/21 Friday 11:59 pm (part of communication 2%)
14 10/18 (T) contd. (finished Lec-7b)
15 10/20 (Th) XML Lecture-8
D8 10/21 (F) Project & HW4 Discussion 8

A HW-4 (mini project) group submission per project team due on 10/24 Monday 10 pm
Gradiance-4 (B+-trees) due on 10/26 Wednesday 10 pm
16 10/25 (T) XML contd. End of Lecture 8

Storage Lecture 7a resumed
12.1-12.2
17 10/27 (Th) Storage Contd. 13.1-13.5
D9 10/28 (F) Index & XML Discussion 9

A Gradiance 5 (XML) due on 11/2 Wednesday 10 pm
HW-5 (Index + XML) due on 11/3 Thursday 10 pm
18 11/1 (T) External sorting
and
Join Processing
Lecture-9
(up to slide 18)
15.1-15.6, 15.8
19 11/3 (Th) Contd.
D10 11/4 (F) Sorting and Join Discussion 10

A Project MS-3 due on 11/10 Thursday 10 pm
NO HOMEWORK DUE
NO GRADIANCE DUE
20 11/8 (T) Query Optimization Lecture-10

(up to slide 26)
16.1-16.7
21 11/10 (Th) NoSQL: JSON and MongoDB Lecture-11 MongoDB Help
D11 11/11 (F) MongoDB & JSON Discussion 11

A Gradiance 6 (Query Processing) due on 11/18 Friday 5 pm (NOTE THE NON-STANDARD DATE AND TIME - Can be submitted by Monday 11/21 10 pm w/o late penalty)
HW-6 (Sorting + Query Processing + JSON) due on Nov 19 Saturday 10 pm
22 11/15 (T) Transaction - Basics and Concurrency Control Lecture-12
(up to slide 30)
23 11/17 (Th) Contd. Finished Lecture 12
D12 11/18 (F) Concurrency Control; Help on HW6 Discussion 12

A NOTHING DUE
24 11/22 (T) Transactions Recovery Lecture-13
(Up to slide 27)
11/24 (Th) Thanksgiving break - no class
11/25 (F) Thanksgiving break - no discussion
A TBD
25 11/29 (T) Contd.
26 12/1 (Th) SQL Recursion Lecture-14
D13 12/2 (F) Practice problems for final
A No assignments, final project report and demo with the TAs are due in this week - details TBA
27 12/6 (T) Map-Reduce, Parallel DBMS Lecture-15
28 12/8 (Th) Early in-class project presentations & Review Lecture-16
D14 12/9 (F) Project presentations
12/17 (Sun) Final Exam in class, 7-10 pm