Data-intensive Computing Systems: Readings

Readings

Required Readings

Textbook Chapter 3: The Hadoop Distributed FileSystem
Textbook Chapter 6: How MapReduce Works
Textbook Chapter 8: MapReduce Features
Pig Latin: A Not-so-foreign Language for Data Processing
By Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. SIGMOD Conference 2008
Building a High-Level Dataflow System on top of MapReduce: The Pig Experience
By Alan Gates, Olga Natkovich, Shubham Chopra, Pradeep Kamath, Shravan Narayanam, Christopher Olston, Benjamin Reed, Santhosh Srinivasan, Utkarsh Srivastava. VLDB Conference 2009
Query Optimization
Read Section 5 (Size-Distribution Estimator) of this paper by Yannis Ioannidis
A Case for Flash Memory SSD in Enterprise Database Applications
By Sang-Won Lee, Bongki Moon, Chanik Park, Jae-Myung Kim, and Sang-Woo Kim
CoHadoop: Flexible Data Placement and its Exploitation in Hadoop
By Mohamed Y. Eltabakh, Yuanyuan Tian, Fatma Ozcan, Rainer Gemulla, Aljoscha Krettek, and John McPherson
Anatomy of the Google search engine (early version)
By Sergey Brin and Lawrence Page