1. Download the harness archive containing the tools from http://www.cs.duke.edu/courses/fall11/cps216/Project/harness.zip You can also find the link on the Project page (under Project Resources) and on the Extra materials page. 2. Enivronment configuration You need to configure the environment before you start using the tool to manipulate AWS cluster. This is a one-time process. You don't need to do this afterwards. 2.1 Open your bash_profile at your home directory. Note that you can use any text editor to edit this file. vi ~/.bash_profile [if you are using linux.cs.duke.edu, you should edit ~/.my-bash_profile instead] 2.2 Press 'i' to enter editing mode. Copy at the end of your file the following lines: #-------------------------------------------------------- export EC2_HOME=[PATH TO ec2-api-tools- directory] export JAVA_HOME=[FILL IN with path to Java 1.6+ home] export HADOOP_EC2_HOME=[FILL IN with path to harness/hadoop_ec2_contrib_bin] export AWS_HADOOP_HARNESS_HOME=[FILL IN with path to harness/aws_hadoop_harness] export PATH=${PATH}:${JAVA_HOME}/bin:${EC2_HOME}/bin:${HADOOP_EC2_HOME} export AWS_USER_ID=[FILL IN] export AWS_ACCESS_KEY_ID=[FILL IN] export AWS_SECRET_ACCESS_KEY=[FILL IN] export EC2_PRIVATE_KEY=[FILL IN with private key file path] export EC2_CERT=[FILL IN certificate file] #----------------------------------------------------- 2.3 You will see lots of statements looks like "export NAME=[VALUE]". You should replace the [VALUE] with something meaningful. Here is a mapping between the NAME and their [VALUE]. EC2_HOME: the path to ec2-api-tools- directory, which should be in the harness directory you get in step 1 JAVA_HOME: the path to JAVA. If you are in linux.cs.duke.edu, the value can be "/usr" HADOOP_EC2_HOME: path to the hadoop_ec2_contrib_bin directory which should be in harness directory you get in step 1 AWS_HADOOP_HARNESS_HOME: path to the aws_hadoop_harness directory which shoud be in harness directory you get in step 1 PATH: include all the paths you specified above. You don't need to change this statement. AWS_USER_ID: login to aws.amazon.com, go to "Security Credentials" page (see below how to get to this page) and in the upper right part of the page, under your name you will see the 12 digit account number. This is the value for AWS_USER_ID. AWS_ACCESS_KEY_ID: still in the "Security Credentials" page (see below how to get to this page). In "Access Credentials" section, click "Access Keys" you should see you "Access Key ID". This is the value for AWS_ACCESS_KEY_ID. If you don't have a key, click "Create a new Access Key" to generate one. AWS_SECRET_ACCESS_KEY: still in the "Security Credentials" page, "Access Credentials" section and "Access Keys" tag. Click "show", you will see the value for AWS_SECRET_ACCESS_KEY_ID. EC2_PRIVATE_KEY: the path to your private key file. How to get this file? In the "Security Credentials" page, "Access Credentials" section, click "X.509 Certificates" You should create a new certificate by clicking "Create a new Certificate", after which you will be required to save a "pk-XXX.pem" file. The path to this file is the value for EC2_PRIVATE_KEY. EC2_CERT: in the same place ("X.509 Certificates"), click download, you can download the CERT file. The path is the value for EC2_CERT [How to get to "Security Credentials" page? 1. go to aws.amazon.com, click the link at the top "Sign in to the AWS Management Console" and use you email/password to log in. 2. after you log in, click the link at the top right (your name). 3. click the link "Security Credentials"] 2.4 Last step in environment setting! Please go to the "Security Credentials" page, "Access Credentials" section and click "Key Pairs". You should see two sections "Amazon CloudFront Key Pairs" and "Amazon EC2 Key Pairs". Follow the "Access your Amazon EC2 Key Pairs using the AWS Management Console" link. In the console, go to "Key Pairs" ("Network & Security" section in the panel on the left side) and click on the "Create Key Pair" button. You can name the key as you please and download it (Note: You can only download it when you generate it. You will have to create a new key pair if you misplace this file). Place it in a secure spot on your computer and make sure that group and others do not have permissions to access this private key. That means, you should run "chmod 600 /path/to/my/key/pair/file". The file name should be something like "rsa-xxxx.pem". Go to harness/hadoop_ec2_contrib_bin, and open the file hadoop-ec2-env.sh by vi hadoop-ec2-env.sh press 'i' to enter editing mode. Find the statement "KEY_NAME=my-keypair" and replace "my-keypair" with the name of the file you just download (don't include the tail ".pem", but only "rsa-xxx"). Find the statement "PRIVATE_KEY_PATH=path_to_your_keypair_file" and replace "path_to_your_keypair_file" with the path to the file you just download (this time you need to include the tail ".pem") Congratulation! You are totally done for the environmental setting. 3. Start using the tools to manipulate the AWS cluster. You should see a file "CHEATSHEET" in the harness directory. All the commands you will be using and their meanings will be listed there. CHEATSHEET: Launching the Hadoop Cluster -------------------------------------- # NOTE: All commands here will be run from the ${AWS_HADOOP_HARNESS_HOME} cd ${AWS_HADOOP_HARNESS_HOME} # Launch a Hadoop cluster: 1 Master + N Slaves ${HADOOP_EC2_HOME}/hadoop-ec2 launch-cluster test-hadoop-cluster 2 # Describe the instances (used to get public ip address etc) ${EC2_HOME}/bin/ec2-describe-instances # Access the JobTracker web page at: http://:50030 # Start the proxy for Ganglia ${HADOOP_EC2_HOME}/hadoop-ec2 proxy test-hadoop-cluster & # Login to the EC2 Hadoop Master node ${HADOOP_EC2_HOME}/hadoop-ec2 login test-hadoop-cluster # Copy a file/dir from the local machine to the Hadoop Master Node ${HADOOP_EC2_HOME}/hadoop-ec2 push test-hadoop-cluster /local/path/to/file # Copy a file/dir from the Hadoop Master Node to the local machine ${HADOOP_EC2_HOME}/hadoop-ec2 pull test-hadoop-cluster /master/path/to/file # Terminate the cluster and release the EC2 nodes (run from local machine) ${HADOOP_EC2_HOME}/hadoop-ec2 terminate-cluster test-hadoop-cluster