Duke DBGroup Logo

Data-intensive Computing Systems
Written Assignment 4
Second milestone of Project 1

Course information
Course schedule and notes
Assignments
Readings
Project
Extra Materials
The deadline for this assignment is Oct 7, 5.00 PM with a grace period of 72 hours.

Please read the detailed instructions very carefully. If you have questions about Project 1 it is for the benefit of everyone to ask them in class.
Here are the detailed steps for this assignment:
  1. Within each group, 1-2 people should take responsibility for each system other than Hadoop. The responsibility for writing plain old Java MapReduce programs and running them on Hadoop can be (although it need not be) shared among all group members. For example, suppose a group has members A, B, C, and D, and that the group is evaluating a workload on Hadoop, System X, and System Y. Group members A and B will be fully responsible for System X; C and D will be fully responsible for System Y; and all of them will share the responsibility of developing MapReduce programs and running them on Hadoop.
  2. If you are using programs, source code, other software, data, etc., obtained from the web or other sources, make sure to cite the sources appropriately in all your reports. Not doing so is akin to plagiarism and will be treated as such. Do not claim the work of others as your own.
  3. The focus of the week of Oct 3-7 is on automating the installation of the systems that you need for Project 1. As part of this process:
    • Each group will develop an AMI (Amazon Machine Image) that contains the system software needed to run their workloads for evaluation. Rozemary, our TA, can help you in case you have questions. She will post a tutorial on AMI creation. She also plans to cover this topic in her TA sessions. She will send a follow-up email to the class on this topic.
    • Later on, Rozemary will also help you upgrade the harness from Programming Assignment 3 so that you can use the harness to bring up your own EC2 cluster with your own AMI (and hence the software you need).
    (PLEASE NOTE: Amazon will charge us for creation and storage of AMIs. The AMIs will be stored on Amazon s S3 cloud storage system. I do not require you to get the AMIs complete in one shot, but try to minimize the number of times you have to create AMIs; and delete all AMIs you create apart from the latest one. Rozemary will contact all groups around October 15 in an attempt to merge AMIs from all groups into one single AMI for the class.)
  4. By 5.00 PM on Oct 7 you should update your Project 1 report with lessons learned and problems faced during the AMI creation process. A grace period of 72 hours is allowed for this milestone. The hard deadline is 5 PM, Oct 10, which falls during the Fall break.