Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
Did this page help you?  Yes | No |  Tell us about it...
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.

Using the AWS SDK for Java to Create an Amazon EMR Cluster

The AWS SDK for Java provides three packages with Amazon Elastic MapReduce (Amazon EMR) functionality:

For more information about these packages, go to the AWS SDK for Java API Reference.

The following example illustrates how the SDKs can simplify programming with Amazon EMR The code sample below uses the StepFactory object, a helper class for creating common Amazon EMR step types, to create an interactive Hive cluster with debugging enabled.

Note

If you are adding IAM user visibility to a new cluster, call RunJobFlow and set VisibleToAllUsers=true, otherwise IAM users cannot view the cluster.

   AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey);
   AmazonElasticMapReduceClient emr = new AmazonElasticMapReduceClient(credentials);

   StepFactory stepFactory = new StepFactory();

   StepConfig enabledebugging = new StepConfig()
       .withName("Enable debugging")
       .withActionOnFailure("TERMINATE_JOB_FLOW")
       .withHadoopJarStep(stepFactory.newEnableDebuggingStep());

   StepConfig installHive = new StepConfig()
       .withName("Install Hive")
       .withActionOnFailure("TERMINATE_JOB_FLOW")
       .withHadoopJarStep(stepFactory.newInstallHiveStep());

   RunJobFlowRequest request = new RunJobFlowRequest()
       .withName("Hive Interactive")
       .withAmiVersion("3.3.1)
       .withSteps(enabledebugging, installHive)
       .withLogUri("s3://myawsbucket/")
       .withServiceRole("service_role")
       .withJobFlowRole("jobflow_role")
       .withInstances(new JobFlowInstancesConfig()
           .withEc2KeyName("keypair")
           .withInstanceCount(5)
           .withKeepJobFlowAliveWhenNoSteps(true)
           .withMasterInstanceType("m3.xlarge")
           .withSlaveInstanceType("m1.large"));

   RunJobFlowResult result = emr.runJobFlow(request);
 
			

If you want to use the default Amazon EMR roles when launching a cluster, you do not need to specify withJobFlowRole and withServiceRole. However, you will need to ensure that the default roles have been created. You can do this by invoking this AWS CLI command for the same account:

aws iam list-roles | grep EMR

Both the instance profile (EMR_EC2_DefaultRole) and the service role (EMR_DefaultRole) will be displayed if they exist:

"RoleName": "EMR_DefaultRole", 
            "Arn": "arn:aws:iam::AccountID:role/EMR_DefaultRole"
            "RoleName": "EMR_EC2_DefaultRole", 
            "Arn": "arn:aws:iam::AccountID:role/EMR_EC2_DefaultRole"

If the default roles do not exist, you can use the following AWS CLI command to create them:

aws emr create-default-roles