Duke DBGroup Logo

Data-intensive Computing Systems: Project Topics

Course information
Course schedule and notes
Assignments
Readings
Project
Extra Materials

    Systems

  1. [Abhishek Dubey] HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel J. Abadi, Avi Silberschatz, Alex Rasin. VLDB 2009. Project page
  2. [Abhishek Dubey] Tim Kaldewey, Eugene J. Shekita, Sandeep Tata: Clydesdale: structured data processing on MapReduce. EDBT 2012 pdf
  3. [Hao Ran Liu] Vinayak R. Borkar, Michael J. Carey, Raman Grover, Nicola Onose, Rares Vernica: Hyracks: A flexible and extensible foundation for data-intensive computing. ICDE 2011 Project page
  4. [Yuxuan Dai] Yingyi Bu, Vinayak R. Borkar, Michael J. Carey, Joshua Rosen, Neoklis Polyzotis, Tyson Condie, Markus Weimer, Raghu Ramakrishnan: Scaling Datalog for Machine Learning on Big Data. CoRR abs/1203.0160: (2012) HTML
  5. [Yi Ding] Tenzing: A SQL Implementation On the MapReduce Framework. Biswapesh Chattopadhyay, Liang Lin, Weiran Liu, Sagar Mittal, Prathyusha Aragonda, Vera Lychagina, Younghee Kwon, Michael Wong. VLDB 2011 HTML, A open-source system inspired by Tenzing
  6. [Hui Dong] Shark and Spark projects. Project page
  7. [Lanceton Mark Dsouza] M3R: Increased performance for in-memory Hadoop jobs, by Avraham Shinnar (IBM Research), David Cunningham (IBM Research), Benjamin Herta (IBM Research), Vijay Saraswat (IBM Research), VLDB 2012 pdf, video
  8. [Hao Guo] BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data Project page
  9. [Wei He] Stratosphere system for information management Project page
  10. [Shenghao Li] Sailfish: A Framework For Large Scale Data Processing Project page
  11. [Xia Li] YARN: Hadoop NextGen MapReduce Project page
  12. [Pengfei Ma] Mesos: Dynamic Resource Sharing for Clusters Project page
  13. Row Vs. Column Storage

  14. [Mayuresh Kunjir] Avrilia Floratou, Jignesh M. Patel, Eugene J. Shekita, Sandeep Tata: Column-Oriented Storage Techniques for MapReduce. VLDB 2011 pdf
  15. [Mayuresh Kunjir] Alekh Jindal, Jorge-Arnulfo Quiane-Ruiz, Jens Dittrich Trojan Data Layouts: Right Shoes for a Running Elephant. SOCC 2011 pdf, Project page
  16. Indexing for MapReduce

  17. [Austin Alexander] Jens Dittrich, Jorge-Arnulfo Quiane-Ruiz, Stefan Richter, Stefan Schuh, Alekh Jindal, Jorg Schad Only Aggressive Elephants are Fast Elephants VLDB 2012 pdf, Project page
  18. [Hanxiao Mao] Jens Dittrich, Jorge-Arnulfo Quiane-Ruiz, Alekh Jindal, Yagiz Kargin, Vinay Setty, and Jorg Schad Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing) VLDB 2010 pdf, Project page
  19. Query Processing

  20. [Harsha Ravi] H. Herodotou and S. Babu. Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs. VLDB 2011 pdf, Project page
  21. [Le Qi] Spyros Blanas, Jignesh M. Patel, Vuk Ercegovac, Jun Rao, Eugene J. Shekita, Yuanyuan Tian. A comparison of join algorithms for log processing in MapReduce. SIGMOD 2010 pdf
  22. [Yao Rong] Query Optimization for Massively Parallel Data Processing. Sai Wu, Feng Li, Sharad Mehrotra, Beng Chin Ooi. SOCC 2011 pdf, Project page
  23. [Jiawei Shi] ReStore: Reusing Results of MapReduce Jobs. Iman Elghandour, Ashraf Aboulnaga, VLDB 2012 HTML
  24. [Yuvraj Singh] Iterative processing extensions to MapReduce.
  25. [Shiyuan Wang] YSmart: An SQL-to-MapReduce Translator Project page
  26. [Tianxu Wang] Statistics for data stored in parallel systems
  27. Data co-location, Compression, and Serialization

  28. [Yinan Xie] Data serialization formats
  29. Mohamed Y. Eltabakh, Yuanyuan Tian, Fatma Ozcan, Rainer Gemulla, Aljoscha Krettek, John McPherson: CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop. VLDB 2011 pdf
  30. [Hang Yin] Data compression
  31. Adaptive Execution

  32. [Ran Zhang]
    • Rares Vernica, Andrey Balmin, Kevin S. Beyer, Vuk Ercegovac: Adaptive MapReduce using situation-aware mappers. EDBT 2012 pdf
    • Reoptimizing Data Parallel Computing. Sameer Agarwal, Srikanth Kandula, Nico Bruno, Ming-Chuan Wu, Ion Stoica, Jingren Zhou. NSDI 2012 pdf, Find slides and video here
  33. Hadoop Schedulers

  34. [Donghe Zhao] Overview, FIFO scheduler, Fair share scheduler, Capacity Scheduler, Dynamic proportional sharing
  35. MapReduce on Multicore and GPU

  36. [Yifei Ding] Phoenix and extensions
  37. [Yifei Ding] Metis
  38. [Xi He] Mars: A MapReduce Framework on Graphics Processors
  39. Hadoop-Database Connectors

  40. [Chenbo Zhu]
    • Oracle Big Data Connectors
    • Fatma Ozcan, David Hoa, Kevin S. Beyer, Andrey Balmin, Chuan Jie Liu, Yu Li: Emerging trends in the enterprise data analytics: connecting Hadoop and DB2 warehouse. SIGMOD 2011 HTML
    • Sqoop
    • HiHo
  41. Improving on HDFS

  42. Real-time Processing

  43. [Yuzhang Han] Twitter's Storm
  44. [Yuzhang Han] WalmartLabs' Muppet