|
|
Required Readings
-
Textbook Chapter 3: The Hadoop Distributed FileSystem
-
Textbook Chapter 6: How MapReduce Works
-
Textbook Chapter 8: MapReduce Features
-
Pig Latin: A Not-so-foreign Language for Data Processing
By Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins.
SIGMOD Conference 2008
-
Building a
High-Level Dataflow System on top of MapReduce: The Pig Experience
By Alan Gates, Olga Natkovich, Shubham Chopra, Pradeep Kamath, Shravan Narayanam, Christopher Olston, Benjamin Reed,
Santhosh Srinivasan, Utkarsh Srivastava. VLDB Conference 2009
- Query Optimization
Read Section 5 (Size-Distribution Estimator) of this paper by Yannis Ioannidis
-
A Case for Flash Memory SSD in Enterprise Database Applications
By Sang-Won Lee, Bongki Moon, Chanik Park, Jae-Myung Kim,
and Sang-Woo Kim
-
CoHadoop: Flexible Data Placement and its Exploitation in Hadoop
By Mohamed Y. Eltabakh,
Yuanyuan Tian,
Fatma Ozcan,
Rainer Gemulla,
Aljoscha Krettek,
and John McPherson
-
Anatomy of the Google search engine (early version)
By Sergey Brin and Lawrence Page
Recommended Readings
-
Big data:
The next frontier for competition, McKinsey report, 2011
-
EMC's articles on the Digital Universe. See the bottom right-hand corner
for a series of interesting articles such as:
The Diverse and Exploding Digital Universe
-
Different subsystems in big data processing, Think Big Analytics.
|