Duke DBGroup Logo

Data-intensive Computing Systems: Readings

Course information
Course schedule and notes
Readings
Project
Extra Materials

Required Readings

  1. Transitioning from Relational Databases to MongoDB - Data Models
  2. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber, Bigtable: A Distributed Storage System for Structured Data, Google, Inc. OSDI 2006
  3. Enabling JSON Document Stores in Relational Systems
  4. Chapter 3 in Tom White's Book: The Hadoop Distributed FileSystem
  5. Chapter 3 in Tom White's Book: The Hadoop Distributed FileSystem
  6. Chapter 6 in Tom White's Book: How MapReduce Works
  7. Chapter 8 in Tom White's Book: MapReduce Features
  8. Pig Latin: A Not-so-foreign Language for Data Processing
    By Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. SIGMOD Conference 2008
  9. Building a High-Level Dataflow System on top of MapReduce: The Pig Experience
    By Alan Gates, Olga Natkovich, Shubham Chopra, Pradeep Kamath, Shravan Narayanam, Christopher Olston, Benjamin Reed, Santhosh Srinivasan, Utkarsh Srivastava. VLDB Conference 2009
  10. Query Optimization
    Read Section 5 (Size-Distribution Estimator) of this paper by Yannis Ioannidis
  11. A Case for Flash Memory SSD in Enterprise Database Applications
    By Sang-Won Lee, Bongki Moon, Chanik Park, Jae-Myung Kim, and Sang-Woo Kim
  12. CoHadoop: Flexible Data Placement and its Exploitation in Hadoop
    By Mohamed Y. Eltabakh, Yuanyuan Tian, Fatma Ozcan, Rainer Gemulla, Aljoscha Krettek, and John McPherson
  13. Anatomy of the Google search engine (early version)
    By Sergey Brin and Lawrence Page

Recommended Readings

  1. Big data: The next frontier for competition, McKinsey report, 2011
  2. EMC's articles on the Digital Universe. See the bottom right-hand corner for a series of interesting articles such as: The Diverse and Exploding Digital Universe
  3. Different subsystems in big data processing, Think Big Analytics.