Duke DBGroup Logo

Data Engineering: Course Schedule

Course information
Course schedule and notes
Readings
Project
Extra Materials
The course schedule will be posted here.
WeekDateTopicReference Material
108-24Introduction and overview [1], [2], [3], [4], [5], pptx, pdf
08-26 Introduction to parallel computing with Spark pptx, pdf
208-31 Introduction to parallel computing with Spark (contd.)
09-02 BTrace tutorial
309-07 Deep dive into techniques for parallel execution pptx, pdf
09-09 Deep dive into techniques for parallel execution (contd.) pptx, pdf
409-14Introduction to the MapReduce computation model pptx, pdf, algorithms
09-16How MapReduce works ppt, pdf
509-21How MapReduce works (contd.) ppt, pdf
09-23Data Partitioning and Assignment Chapter 2 from Foundations article
609-28Introduction to Amazon Web Services
09-30Midterm 1
710-05SQL Query Processing ppt, pdf
10-07SQL Query Processing (contd.) ppt, pdf
810-12Fall Break
10-14Pipelined Query Execution ppt, pdf, notes
910-19SQL Query Plan Selection ppt, pdf
10-21SQL Query Plan Selection (contd.) ppt, pdf
1010-26Introduction to Data Stream Processing ppt, pdf
10-28Role of Kafka in Modern Data Processing reading
1111-02Distributed Data Stream Processing (Execution) reading (along with the other three parts of this blog series), reading
11-04Distributed Data Stream Processing (Fault Tolerance) reading (along with the other three parts of this blog series), reading, reading
1211-09Midterm 1 Review Some solutions were posted on Sakai
11-11Midterm 2
1311-16No class
11-18Data Stream Processing with Micro Batches reading, reading
1411-23Graph and Iterative Processing pptx, pdf
11-25Thanksgiving break
1511-30NoSQL Systems reading, pptx, pdf
12-02NoSQL Systems pptx, pdf