Duke DBGroup Logo

Data-intensive Computing Systems: Readings

Course information
Course schedule and notes
Extra Materials

Required Readings

  1. Textbook Chapter 3: The Hadoop Distributed FileSystem
  2. Textbook Chapter 6: How MapReduce Works
  3. Textbook Chapter 8: MapReduce Features
  4. Pig Latin: A Not-so-foreign Language for Data Processing
    By Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. SIGMOD Conference 2008
  5. Building a High-Level Dataflow System on top of MapReduce: The Pig Experience
    By Alan Gates, Olga Natkovich, Shubham Chopra, Pradeep Kamath, Shravan Narayanam, Christopher Olston, Benjamin Reed, Santhosh Srinivasan, Utkarsh Srivastava. VLDB Conference 2009
  6. Query Optimization
    Read Section 5 (Size-Distribution Estimator) of this paper by Yannis Ioannidis
  7. A Case for Flash Memory SSD in Enterprise Database Applications
    By Sang-Won Lee, Bongki Moon, Chanik Park, Jae-Myung Kim, and Sang-Woo Kim
  8. CoHadoop: Flexible Data Placement and its Exploitation in Hadoop
    By Mohamed Y. Eltabakh, Yuanyuan Tian, Fatma Ozcan, Rainer Gemulla, Aljoscha Krettek, and John McPherson
  9. Anatomy of the Google search engine (early version)
    By Sergey Brin and Lawrence Page

Recommended Readings

  1. Big data: The next frontier for competition, McKinsey report, 2011
  2. EMC's articles on the Digital Universe. See the bottom right-hand corner for a series of interesting articles such as: The Diverse and Exploding Digital Universe
  3. Different subsystems in big data processing, Think Big Analytics.