-------------------------------------------------------------- Useful references before you start: ---------------------------------- - http://docs.amazonwebservices.com/AWSEC2/latest/GettingStartedGuide/ - http://wiki.apache.org/hadoop/AmazonEC2 -------------------------------------------------------------- -------------------------------------------------------------- Software to be installed on local machine: ------------------------------------------ 1. You should download the EC2 API Tools from http://aws.amazon.com/developertools/351. For more info about this, refer to the Create_AMI_Instructions.pdf from the course website: http://www.cs.duke.edu/courses/fall11/cps216/Project/Create_AMI_Instructions.pdf 2. Download the new AWS harness from http://www.cs.duke.edu/courses/fall11/cps216/Project/harness.zip In the harness folder that you extract from the archive you will find: -- The Hadoop ec2 contrib sources we need are in: harness/hadoop_ec2_contrib_bin -- The AWS harness sources we need are in: harness/aws_hadoop_harness --------------------------------------------------------------------------- ---------------------------------------------------------------------------- By following the instructions from Create_AMI_Instructions.pdf you should have aready set up the EC2 API Tools. You should modify the .bash_profile file (${HOME}/.bash_profile or ${HOME}/.my-bash_profile) that you set up for the old harnes, such that it will point to the new harness. Changes that need to be done: 1. The new harness does not contain the EC2 API Tools. You should install them as described in Create AMI Instructions and make sure that the $EC2_HOME points to the installation folder. For this you should run export EC2_HOME=/path/to/ec2-api-tools-/directory 2. Export the ec2 contrib directory of the new harness export HADOOP_EC2_HOME=[FILL IN with path to new_harness/hadoop_ec2_contrib_bin] 3. Export the directory containing the new harness export AWS_HADOOP_HARNESS_HOME=[FILL IN with path to new_harness/aws_hadoop_harness] 4.Update the paths so that we can access all the ec2-api and Hadoop contrib executables from ${AWS_HADOOP_HARNESS_HOME} export PATH=${PATH}:${JAVA_HOME}/bin:${EC2_HOME}/bin:${HADOOP_EC2_HOME} 5. The rest of the settings should stay the same (AWS_USER_ID, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, EC2_PRIVATE_KEY, EC2_CERT). -------------------------------------------------------------- ---------------------------------------------------------------------------- Note: suppose your keypair is named my-keypair Then, the Hadoop EC2 configuration ${HADOOP_EC2_HOME}/hadoop-ec2-env.sh makes the assumption that my-keypair.pem is stored in the same *directory* where you placed your AWS private key (i.e., the ${EC2_PRIVATE_KEY}) You should copy the template file with the local settings cp ${HADOOP_EC2_HOME}/local_ec2_settings.sh.template ${HADOOP_EC2_HOME}/local_ec2_settings.sh local_ec2_settings.sh is the file that you modify to specify your own settings and credentials: KEY_NAME - Name of your EC2 keypair name PRIVATE_KEY_PATH - The full path to the EC2 keypair file INSTANCE_TYPE - Supported types: m1.small, m1.large, m1.xlarge, c1.medium, c1.xlarge, cc1.4xlarge HADOOP_VERSION - Supported versions: less than 0.19.0, 0.20.2, 0.20.203.0 AMI_IMAGE_32 - Will be selected if INSTANCE_TYPE is m1.small or c1.medium AMI_IMAGE_64 - Will be selected if INSTANCE_TYPE is m1.large or m1.xlarge or c1.xlarge IMPORTANT: You should modify the AMI_IMAGE_32 or AMI_IMAGE_64 values such that the image id corresponds to the id of the AMI that you created. If, when you created your own AMI, you started from a 32 bit image from the table in section 5 of Creating AMI Instructions then you will end up with a 32 bit AMI, so in the local_ec2_settings.sh you should modify AMI_IMAGE_32. If you started from a 64 bit image then your AMI will also be a 64 bit and you should modify AMI_IMAGE_64. It is highly recommended to have a 32 bit image. -------------------------------------------------------------- ---------------------------------------------------------------------------- CHEATSHEET ---------------------------- Launching the Hadoop Cluster ---------------------------- # NOTE: All commands here will be run from ${AWS_HADOOP_HARNESS_HOME} cd ${AWS_HADOOP_HARNESS_HOME} # Launch a Hadoop cluster: 1 Master + N Slaves ${HADOOP_EC2_HOME}/hadoop-ec2 launch-cluster test-hadoop-cluster # Example: Launch a Hadoop cluster with 1 Master + 2 Slaves; all of the m1.small type ${HADOOP_EC2_HOME}/hadoop-ec2 launch-cluster test-hadoop-cluster 2 m1.small # Describe the instances (used to get public ip address etc) ${EC2_HOME}/bin/ec2-describe-instances # Access the JobTracker web page at: http://:50030 # Start the proxy for Ganglia ${HADOOP_EC2_HOME}/hadoop-ec2 proxy test-hadoop-cluster & # Login to the EC2 Hadoop Master node ${HADOOP_EC2_HOME}/hadoop-ec2 login test-hadoop-cluster # Copy a file/dir from the local machine to the Hadoop Master Node ${HADOOP_EC2_HOME}/hadoop-ec2 push test-hadoop-cluster /local/path/to/file # Copy a file/dir from the Hadoop Master Node to the local machine ${HADOOP_EC2_HOME}/hadoop-ec2 pull test-hadoop-cluster /master/path/to/file # Terminate the cluster and release the EC2 nodes (run from local machine) # *** Enter yes when the command asks for confirmation *** ${HADOOP_EC2_HOME}/hadoop-ec2 terminate-cluster test-hadoop-cluster