--------------------------------------------------------------

Useful references before you start: 
----------------------------------
- http://docs.amazonwebservices.com/AWSEC2/latest/GettingStartedGuide/
- http://wiki.apache.org/hadoop/AmazonEC2

--------------------------------------------------------------
--------------------------------------------------------------

Software to be installed on local machine: 
------------------------------------------

1. You should download the EC2 API Tools from http://aws.amazon.com/developertools/351.
For more info about this, refer to the Create_AMI_Instructions.pdf from the course website:
http://www.cs.duke.edu/courses/fall11/cps216/Project/Create_AMI_Instructions.pdf

2. Download the new AWS harness from 
http://www.cs.duke.edu/courses/fall11/cps216/Project/harness.zip    
In the harness folder that you extract from the archive you will find:
	-- The Hadoop ec2 contrib sources we need are in: harness/hadoop_ec2_contrib_bin
	-- The AWS harness sources we need are in: harness/aws_hadoop_harness

---------------------------------------------------------------------------
----------------------------------------------------------------------------

By following the instructions from Create_AMI_Instructions.pdf you should have aready 
set up the EC2 API Tools. 

You should modify the .bash_profile file (${HOME}/.bash_profile or ${HOME}/.my-bash_profile)
that you set up for the old harnes, such that it will point to the new harness.

Changes that need to be done: 

1. The new harness does not contain the EC2 API Tools. You should install them as described
in Create AMI Instructions and make sure that the $EC2_HOME points to the installation folder.
For this you should run export EC2_HOME=/path/to/ec2-api-tools-<version>/directory

2. Export the ec2 contrib directory of the new harness 
export HADOOP_EC2_HOME=[FILL IN with path to new_harness/hadoop_ec2_contrib_bin]

3. Export the directory containing the new harness  
export AWS_HADOOP_HARNESS_HOME=[FILL IN with path to new_harness/aws_hadoop_harness]

4.Update the paths so that we can access all the ec2-api and Hadoop contrib 
executables from ${AWS_HADOOP_HARNESS_HOME}
export PATH=${PATH}:${JAVA_HOME}/bin:${EC2_HOME}/bin:${HADOOP_EC2_HOME}

5. The rest of the settings should stay the same (AWS_USER_ID, AWS_ACCESS_KEY_ID, 
AWS_SECRET_ACCESS_KEY, EC2_PRIVATE_KEY, EC2_CERT).

--------------------------------------------------------------
----------------------------------------------------------------------------

Note: suppose your keypair is named my-keypair
Then, the Hadoop EC2 configuration ${HADOOP_EC2_HOME}/hadoop-ec2-env.sh 
makes the assumption that my-keypair.pem is stored in the same *directory* where you placed 
your AWS private key (i.e., the ${EC2_PRIVATE_KEY})

You should copy the template file with the local settings
cp ${HADOOP_EC2_HOME}/local_ec2_settings.sh.template ${HADOOP_EC2_HOME}/local_ec2_settings.sh

local_ec2_settings.sh is the file that you modify to specify your own settings and credentials:
  KEY_NAME - Name of your EC2 keypair name
  PRIVATE_KEY_PATH - The full path to the EC2 keypair file
  INSTANCE_TYPE - Supported types: m1.small, m1.large, m1.xlarge, c1.medium, c1.xlarge, cc1.4xlarge
  HADOOP_VERSION - Supported versions: less than 0.19.0, 0.20.2, 0.20.203.0
  AMI_IMAGE_32 - Will be selected if INSTANCE_TYPE is m1.small or c1.medium
  AMI_IMAGE_64 - Will be selected if INSTANCE_TYPE is m1.large or m1.xlarge or c1.xlarge

IMPORTANT: You should modify the AMI_IMAGE_32 or AMI_IMAGE_64 values such that the image id 
corresponds to the id of the AMI that you created. If, when you created your own AMI, you started 
from a 32 bit image from the table in section 5 of Creating AMI Instructions then you will end up
with a 32 bit AMI, so in the local_ec2_settings.sh you should modify AMI_IMAGE_32. If you started 
from a 64 bit image then your AMI will also be a 64 bit and you should modify AMI_IMAGE_64. It is
highly recommended to have a 32 bit image.

--------------------------------------------------------------
----------------------------------------------------------------------------

CHEATSHEET
----------------------------
Launching the Hadoop Cluster
----------------------------

# NOTE: All commands here will be run from ${AWS_HADOOP_HARNESS_HOME}
cd ${AWS_HADOOP_HARNESS_HOME} 

# Launch a Hadoop cluster: 1 Master + N Slaves 
${HADOOP_EC2_HOME}/hadoop-ec2 launch-cluster test-hadoop-cluster <number of slaves> <node type on EC2>
# Example: Launch a Hadoop cluster with 1 Master + 2 Slaves; all of the m1.small type
${HADOOP_EC2_HOME}/hadoop-ec2 launch-cluster test-hadoop-cluster 2 m1.small

# Describe the instances (used to get public ip address etc)
${EC2_HOME}/bin/ec2-describe-instances 

# Access the JobTracker web page at:
http://<replace with public domain name of Hadoop Master>:50030

# Start the proxy for Ganglia
${HADOOP_EC2_HOME}/hadoop-ec2 proxy test-hadoop-cluster &

# Login to the EC2 Hadoop Master node
${HADOOP_EC2_HOME}/hadoop-ec2 login test-hadoop-cluster

# Copy a file/dir from the local machine to the Hadoop Master Node
${HADOOP_EC2_HOME}/hadoop-ec2 push test-hadoop-cluster /local/path/to/file

# Copy a file/dir from the Hadoop Master Node to the local machine
${HADOOP_EC2_HOME}/hadoop-ec2 pull test-hadoop-cluster /master/path/to/file

# Terminate the cluster and release the EC2 nodes (run from local machine)
# *** Enter yes when the command asks for confirmation ***
${HADOOP_EC2_HOME}/hadoop-ec2 terminate-cluster test-hadoop-cluster