u CompSci 316 - Introduction to Databases -Fall 2022

CompSci 316

Introduction to Databases

Fall 2022

Home
Updates
Time/Day
Staff
Grading
Workload
Resources / Toolkits
Course Policy
Help
Schedule
Have questions? Please email both Sudeepa (sudeepa at cs.duke.edu) and Alex (alex.chao at duke.edu) for questions on logistics.
All other questions should be discussed on the Ed discussion forum.

(This website is work in progress).

Overview

We intend this course to give you a solid background in database systems as well as managing and processing "big data" in general. Topics include data modeling, database design theory, data definition and manipulation languages (SQL, relational algebra, and NoSQL), database application programming interfaces, storage and indexing, query processing and optimization, parallel and distributed data processing, transaction processing, as well as a sample of other advanced topics.

Prerequisites:

CompSci 201 AND one of CompSci 210/250, or, consent of the instructor.

Updates for Fall'22:

  • Welcome to CompSci 316, Fall 2022! The first class will be on Tuesday August 30. Please check this website and posts on Ed frequently and carefully for all logistics information and announcements. We hope you would have fun in learning database systems in this class!

  • Please note that everyone is required to wear a mask in the classroom. Please check out Duke's COVID protocol frequently.

  • Attendance in discussion session is waived for students who have COVID and need to quarantine. Please see the details below under discussion sessions.


Time/Day

      Lectures: Tuesdays and Thursdays, 1:45 pm - 3:00 pm, Bryan Center Griffith Theater

      Discussion-01: Fridays, 10:15 am to 11:30 am, LSRC B101
      Discussion-02: Fridays, 1:45 pm to 3:00 pm, FFSC 2231

       (attendance in one of the discussion sessions is required, see grading below)

    
     Office Hours: See below.


Course Staff

Instructor

      Sudeepa Roy

Teaching Associate

      Alex Chao

Grad TAs

       Yuxi Liu, Zhe Wang, Haibo Xiu, and (half-TAs) Tong Lin and Fiona (Chengyu) Wu

UTAs

       Konstantinos Bailas, Samy Boutouis, Neel Gajjar, Joshua Guo, Alexandra Lawrence, Joon Young Lee, Justin Lim, Danny Luo, Alok Malhotra, Harris Masterson, Jason Qiu, Alex Schiff, Grace Tian, Joyce Wang Samia Zaman, Han Zhang, Zachary Zheng.

Not sure who your TAs are? Check out first few slides of Lecture 1

Weekly Office Hour Calendar:

See Ed for the calendar.

Grading

You have a considerable amount of control over the final grade you would receive in this class if you work hard, and should have a fair idea where you approximately stand at any point of time!
  • Grading is done on an absolute, but adjustable scale. Anyone earning
    • 90% or more of the total number of points available will receive a grade in the A range (A+, A, A-);
    • 80% or more guarantees a grade in the B range (B+, B, B-);
    • 70% or more guarantees a grade in the C range (C+, C, C-);
    • 60% or more guarantees a D (no +/- in this range).

  • At the discretion of the instructor, the grading scale may slide down (i.e., grades go higher), but it will not slide up.

  • Assignment of +/- in the letter grades will be decided by the instructor based on the performance of the entire class. The highest overall score in the class, and only the exceptional performances will receive the A+ grade.


Weights of each component:

See details in Workload below.

Component Weight
Homeworks 34%
           Written problem solving /programming 25%
           Gradiance exercises (2 lowest score dropped) 9%
Exams 35%
           Midterm 17%
           Final exam 18%
Project 24%
Class participation 7%
           Attending discussion sessions (quizzes and collaboration) 5%
           Communication 2%
Extra Credit (above 100%) 2%
           Extra credit problems from HW and Exams (lowest 1/4th scores dropped) 2%


Workload

  • Homeworks (34%): There will be weekly homeworks (due in 7-10 days). They will be based on the last 1-2 lectures. They are of two types:

    1. Written problem solving and programming assignments (25%):
      Start early and allocate enough time to solve these problems!

      • Late policy: Homeworks will be due at 10 pm. There will be a one hour grace period if any submission is a few minutes late for some technical reasons when you will not lose any points. After that, homework problems that are submitted late will receive an automatic deduction of 5% per hour late (per problem as marked by Gradescope, not per sub-problem, so even if one sub-problem is late, the problem is late). You will receive no credit after the sample solution becomes available.

        Note that the deadline is 10 pm, not 11 pm, so submitting after 10 pm will be marked as late by Gradescope, but you won't lose points until 11:00 pm. Submission marked at 11:01 will lose 5%. Q/A on Ed may be slow and uncertain at night and may not be available after 10 pm, and there may be unexpected technical issues, so please start and submit early. Unless gradescope crashes and does not accept homeworks by 10 pm (then remember to take a screenshot to inform us on Ed), no other types of technical issues or lack of help from the course staff at the last minute will be considered as a valid excuse if you do not submit by the deadline. You might find it useful to save your solutions to an online drive like Duke Box (meant for automated synchronization) to be able to access them from any machine.

        Exceptions will only be made in the case of documented excuses; follow the standard university procedure for filing them - in other words, you must submit an Incapacitation Form (STINF), Religious Observance Notification Form, or Notification of Varsity Athletic Participation Form before the deadline of an assignment -- these forms will give you two days (48 hours) of extensions, but the standard late penalty won't apply after that extension, i.e., your homework won't be accepted after 48 hours. For any other reason, if you submit a form after the hw deadline, or if you need an extension beyond two days, you must arrange for your academic dean to email the instructor regarding your circumstances (Dean's Excuse). You must have an email from Sudeepa granting the extension - otherwise the standard late penalty would apply. Also note that help from the course staff may be slow or uncertain in the weekend except office hour timesDo not rely on getting an extension and start working on homeworks/quizzes right after they are released, try to finish homeworks as early as possible, and reach out to us early on Ed or in office hours so that you can get enough help.

    2. Gradiance exercises (9%):
      Gradiance is an online service pioneered by one of the authors of the textbook, Prof. Jeffrey Ullman at Stanford. One of the best features of Gradiance is that you are permitted to test yourself on a particular topic as many times as you like. You receive immediate feedback for each attempt, which avoids the shortcoming of the traditional submit-and-then-wait-for-grades assignments where one error in understanding can permeate solutions to multiple problems and does not get rectified until much later. We encourage you to continue testing on each topic until you complete the part of the assignment with a 100% score. The highest score will be recorded. The questions will be the same in every attempt, but the answer choices will be selected at random. We will drop the lowest two scores at the end.

      • Late policy: Gradiance exercises will also be due at 10 pm. There are no late days or hours for gradiance assignments (under all circumstances). It will automatically close after the deadline. The website might have occasional downtime for maintenance. Make sure to start early and finish them by the deadline.



  • Project (24%): The course projects are to be done in groups of five members. All project members must be chosen from the same discussion section. More details would be posted later.

  • Midterm (17%) and final (18%): Both midterm and final exams are open-book and open-notes. Final is comprehensive but may focus on materials not already covered by the midterm. There won't be any make up exam. There won't be any make up or late exams. If you miss the midterm, if you have a documented excuse as mentioned for the homework assignments above, your midterm score will be replaced by the score you receive in the final. Note that the final may be easier or harder than the midterm. The final exam is required, to get a valid grade in this class, you must take the final exam as scheduled by the university.

  • Class Participation (7%):
    1. Attendance and assignments in discussion sessions (5%): Each discussion session may have points for attendance, and for solving quizzes and practice problems. All discussion sessions have the same weight. We will drop the lowest three scores for discussion sessions to account for the days when you cannot attend it.

      Attendance in discussion session is only excused (i.e., the attendance will be given to the student while that student is not present in the discussion session) if a student gets COVID during the semester for everyone's safety. In that case, the student should write an email to Sudeepa, Alex, and the student's Dean notifying them that the student has COVID. If the student is required to miss the discussion session in the week after (i.e., will miss two discussion sessions in a row) as advised by the student health or doctors, please again notify Sudeepa, Alex, and the student's Dean by replying to the same message.

    2. Communication (2%): We will be regularly contacting you about some information we might need, or your progress, feedback, concerns, etc. These points are reserved for your response within the time limit provided in the emails. All are required and all have the same weights.

  • Extra credit problems (2% on top of 100%): There might be 0-2 extra credit problems with each assignment and exam, each with equal weight, which you can choose to solve to get up to 2% extra credit above your grade. We will drop (ceiling of) 25% lowest scores, e.g., if there are 5 extra credit problems in the entire course, we will drop 2 lowest scores, and if there are 10 extra credit problems, we will drop 3 lowest scores.

Note: Almost all types of assignments have options for late days or removal of some lowest scores. To ensure fairness to everyone, no additional late days would be granted by email requests before the deadline. Please start early!

How much time should I allocate to get work done in CompSci 316?

You should plan for the following every week. You should approximately allocate 8-12 hours every week for the class, although it may vary from student to student.

Activity Hours/Week Days of Weeks
In-person lectures 1.25 x 2 = 2.5 Tues, Thurs
Discussion session 1.25 Fri
Assignments 2-5 (varies) varies, typically not Friday or weekend!
Project 2-4 (heavier work likely later in the course,
if you do an "open project", expect 1.5x or so more work (and more fun!))
Mon (weekly updates)
+
Two milestones and a final project submission


Resources / Communication / Toolkits

Book: If you would like to consult a textbook, we primarily use the following book:

Database Systems: The Complete Book, by Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom. 2nd Edition. Prentice Hall. 2008.

See the publisher's book page and the Amazon book page. Relevant chapters for reading are posted under Schedule. In a typical semester, textbooks for this course are available for 3-hour checkouts at the Duke Libraries. Search the Libraries' Top Textbooks program here: https://library.duke.edu/course-support/course-reserves/textbooks. Please consult the library for options during the COVID-19 situation.

Gradescope: We will use Gradescope (for submission and grading of (non-Gradiance) homeworks and project work, as well as grading of exams.

Communication and Ed: You should check Ed regularly for important course-related announcements. Important announcements will be also sent through Sakai and in class lectures.

All questions that may be of general interest to the class should be directed to Ed. You will get your questions answered faster on Ed than via personal emails to the course staff (who will direct you to Ed), because Ed is monitored closely by everybody in the class, not just the course staff. You are highly encouraged to answer each others' questions on Ed and the course staff would endorse/add to those answers. Only for logistics-related issues, email both Sudeepa and Alex.

Sakai: We will use the Sakai course management system for posting sample solutions (under "Resources") and for checking grades (under "Gradebook").

Computing: You will need access to a computer (any major OS will do) on which you are allowed to install new software. We will also use cloud-based virtual machines - see Help for details.

Course Policy

Standards of Conduct: Under the Duke Community Standard, you are expected to submit your own work in this course, including homeworks, projects, and exams.
  • On many occasions when working on homeworks and projects, it is useful to ask others (the instructor, the TA, or other students) for hints or debugging help, or to talk generally about the written problems or programming strategies. Such activity is both acceptable and encouraged, but you must indicate in your submission any assistance you received (including help from the course staff). Any assistance received that is not given proper citation will be considered a violation of the Standard.

  • In any event, you are responsible for writing, understanding, and being able to explain on your own all written and programming solutions that you submit.

  • Copying solutions to any problem in any assignment from other students in the class, even if you have discussed those problems with them, is strictly prohibited.

  • In particular, it should not be the case that a group of students is working together to come up with a single solution. Everyone should try to solve the problems on their own, and then you can discuss with TAs and other students for debugging. If you are completely stuck with a problem, we strongly encourage to go to TA office hours than asking other students for hints to avoid wrong solutions.

  • It is strictly not allowed to seek help outside your TAs and classmates for solving the assignments, so you CANNOT search for answers on the Web, ask students from previous semester taking this course or anyone else for help and material, or search for solutions from previous semesters.

  • You can use online tutorial and resources for your project, but the entire code must be written by your team members. Please acknowledge all websites that you have consulted in your project milestone reports.

  • Exam policy: Exams are open book, open notes, no collaboration or electronic devices are allowed, and comprehensive (until the lecture before the exam). More information will be announced before the exams. Also see above about missing exams.

  • The course staff will pursue aggressively all suspected cases of violations, and they will be handled through official University channels. Any proven violation of course policy would result in a zero in the entire assignment (not just the problem with violated policy) and may result in strict disciplinary actions.

  • If you are unsure of a policy, please ask Sudeepa or Alex and do not assume anything.


Help

The help section will be updated when the class starts.

Schedule

(subject to change)

"Notes" will be uploaded before the class and are intentionally left incomplete for interactive lectures. Completed "slides" will be uploaded after the lectures. Chapters for optional reading will be updated after the lectures.

D = Discussion session

A = Assignments

-->
  Day Topic                 Slides            Assignments / Remarks Optional Reading
1 8/30 (T) Introduction Lecture-1

2.1, 2.2, 6.1, 6.2
2 9/1 (Th) Relational model,
Basic SQL,
and Relational Algebra (RA)
Lecture-2

(up to slide 34)
2.3, 2.4
D1 9/2 (F) VM Setup and basic SQL + RA Discussion-1-sol

(up to 2nd RA)
A None due
3 9/6 (T) RA contd.
+ Database design in E/R model

Guest Lecture by Dr. Amir Gilad
Lecture-3

4.1-4.4
4 9/8 (Th) Database design: E/R-relational translation

Guest Lecture by Prof. Jun Yang
Lecture-4

4.5-4.6
D2 9/9 (F) More RA; Part of HW-1 (RA) solving Discussion-2

A Gradiance-1 due on 9/14 Wednesday 10 pm (no extension/late days)
HW-1 (RA) due on 9/15 Thursday 10 pm (see late policy above)
Names of members for each team due on 9/16 Friday 5 pm, gradescope group assignment, graded as communication
5 9/13 (T) SQL: aggregation, subqueries, NULL, outerjoin, modifications, constraints, triggers, views Lecture-5

(up to slide 26)
2.3, 6.1.1-6.1.7, 6.2-6.5, 7.1-7.5, 8.1-8.3
6 9/15 (Th) contd. (up to slide 55)
D3 9/16 (F) ERD & iREX tool for SQL Discussion-3

A Gradiance-2 (ERD) due on 9/21 Wednesday 10 pm
HW-2 (ERD) due on 9/22 Thursday 10 pm
7 9/20 (T) Project mixer : guest lecture by Danai Adkisson (OIT) on project setup Slides from CoLab
8 9/22 (Th) SQL contd.
D4 9/23 (F) Part of SQL HW-3 solving with iRex Discussion-4

A Gradiance-3 (SQL & NULL) due on 9/28 Wednesday 10 pm
HW-3 (SQL) due on 9/29 Thursday 10 pm
9 9/27 (T) Database design theory: FD, BCNF Lecture-6

(up to slide 19)
3.1-3.4, 3.6, 3.7
10 9/29 (Th) SQL recursion
D5 9/30 (F) Midterm practice problems
A NO GRADIANCE DUE
NO HOMEWORK DUE
11 10/4 (T) Midterm in class
(syllabus: everything covered until and including 9/29)
9.1, 9.3, 9.4, 9.6, 10.2
12 10/6 (Th) Storage
D6 10/7 (F) TBD
A TBD
10/11 (T) Fall break - no class
13 10/13 (Th) Index 14.1, 14.2
D7 10/14 (F) TBD
A TBD
14 10/18 (T) contd.
15 10/20 (Th) Query Processing
D8 10/21 (F) TBD
A TBD
16 10/25 (T) Join Algorithms and external sorting 15.1-15.6, 15.8
16.1, 16.7.3-16.7.5
17 10/27 (Th) Query Optimization
D9 10/28 (F) TBD
A TBD
18 11/1 (T) Contd. 16.2-16.6
19 11/3 (Th) XML 11, 12.1, 12.2
D10 11/4 (F) TBD
A TBD
20 11/8 (T) XML-relational mapping,
NoSQL: JSON and MongoDB
21 11/10 (Th) Transaction
D11 11/11 (F) TBD
A TBD
22 11/15 (T) Contd.
23 11/17 (Th) Transaction: Recovery
D12 11/18 (F) TBD
A TBD
24 11/22 (T) contd.
11/24 (Th) Thanksgiving break - no class
11/25 (F) Thanksgiving break - no discussion
A TBD
25 11/29 (T) TBD
26 12/1 (Th) Map-Reduce, Parallel DBMS
D13 12/2 (F) TBD
A No assignments, final project report and demo video are due on 12/9
27 12/6 (T) contd.
28 12/8 (Th) Data mining/TBD
D14 12/9 (F) TBD
12/17 (Sun) Final Exam in class, 7-10 pm