Duke DBGroup Logo

Compsci 290: Data Engineering, Fall 2015

Course information
Course schedule and notes
Extra Materials


  • Programming Assignment 6 has been posted on the Project page. See link on the left. The assignment is due by 11.59 PM on Sunday, Dec 6.

Course Description

Smart, data-driven applications are the future. This class teaches the new engineering principles that have emerged to create and run such applications. Companies are trying hard to extract valuable insights from data. This process is not easy:

To prepare students to meet these challenges, this course brings together topics from multiple areas of Computer Science: database systems, distributed computing, algorithms, and machine learning. A lot of the course material is drawn from recent research literature. This year, we will cover the engineering principles that underpin:

  • Data-parallel computing
  • SQL query processing
  • Real-time stream processing
  • Graph analysis
  • Iterative computing
  • Distributed NoSQL systems
  • Multi-tenant resource allocation at scale
Note: Spark will be used very heavily in this course.

Prerequisites: Good knowledge of Scala or Java is required. Prior exposure to databases will be very helpful. Most of the material that we cover will not be found in textbooks. Be prepared to do a fair amount of web search and reading.

Time and Place

10.05-11:20 AM on Mondays and Wednesdays; in the Sociology Psychology Building, Room Number 129.

There is no prescribed textbook for the class.

Useful References

Learning Spark: Lightning-Fast Big Data Analysis, by by Holden Karau, Andy Konwinski, Patrick Wendell, and Matei Zaharia. O'Reilly Media. Feb 2015. (First edition of the book at Amazon.com)

Hadoop: The Definitive Guide, by Tom White. O'Reilly Media. April 2015. (Fourth edition of the book at Amazon.com)

Database Systems: The Complete Book, by Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom. Prentice Hall. 2008. (The Second edition of the book at Amazon.com)

Readings will be posted on the readings page.


Instructor: Shivnath Babu
Office: D338 LSRC, Phone: 919-660-6579 (email is recommended)
Office hours: The instructor prefers to have office hours by appointment so that we make the best use of time. Send the instructor an email to fix the meeting time. The office hours will be held in the instructor's office.

TA: Junghoon Kang
Office: N303B North, Phone: 919-660-6557
Office hours: Monday 3.15 - 4.15 PM EST and Wednesday 1.30 - 2.30 PM EST


Homeworks, Programming assignments, and Project50%

This class is heavy on programming. Details will be presented in class.

The midterm and final exams are not open-book or open-notes. Laptops and other electronic devices are also not allowed. Late work will not be accepted, unless there are documented excuses from a physician or dean.

Honor Code

Under the Duke Honor Code, you are expected to submit your own work in this course, including homeworks, projects, and exams. On many occasions when working on homeworks and projects, it is useful to ask others (the instructor or other students) for hints or debugging help, or to talk generally about the written problems or programming strategies. Such activity is both acceptable and encouraged, but you must indicate in your submission any assistance you received. Any assistance received that is not given proper citation will be considered a violation of the Honor Code. In any event, you are responsible for understanding and being able to explain on your own all written and programming solutions that you submit. The course staff will pursue aggressively all suspected cases of Honor Code violations, and they will be handled through official University channels.