Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Connect to the Cluster

Often when you run an Amazon Elastic MapReduce (Amazon EMR) cluster, all you need to do is launch the analysis and then collect the output from an Amazon S3 bucket. There are other times, however, when you'll want to interact with the master node while the cluster is running. For example, you may want to connect to the master node to run interactive queries, check log files, monitor performance using an application such as Ganglia that runs on the master node, debug a problem with the cluster, and more. The following sections describe techniques you can use to connect to the master node.

In an Amazon EMR cluster, the master node is an EC2 instance that coordinates the EC2 instances that are running as task and core nodes. The master node exposes a public DNS name that you can use to connect to it.

Note

You can connect to the master node only while the cluster is running. After the cluster terminates, the EC2 instance acting as the master node is terminated and no longer available. You also must specify an Amazon EC2 key pair when you launch the cluster, as you use the key pair as the credentials for the SSH connection. If you are launching the cluster from the console, the Amazon EC2 key pair is specified on the ADVANCED OPTIONS pane of the Create a New Job Flow wizard.

By default, Amazon EMR creates security group rules for master and slave nodes. For example, TCP port 22 is open by default to allow you to connect to the master node using SSH. Also, port 8443 and certain IP ranges are opened to allow the cluster control plane to operate.

Note

You should not modify the default control plane security group rules as that may interfere with the operation of your cluster.