Amazon EMR
Management Guide

Using the AWS SDK for Java to Create an Amazon EMR Cluster

The AWS SDK for Java provides three packages with Amazon EMR functionality:

For more information about these packages, see the AWS SDK for Java API Reference.

The following example illustrates how the SDKs can simplify programming with Amazon EMR The code sample below uses the StepFactory object, a helper class for creating common Amazon EMR step types, to create an interactive Hive cluster with debugging enabled.


If you are adding IAM user visibility to a new cluster, call RunJobFlow and set VisibleToAllUsers=true, otherwise IAM users cannot view the cluster.

AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey); AmazonElasticMapReduceClient emr = new AmazonElasticMapReduceClient(credentials); String COMMAND_RUNNER = "command-runner.jar"; String DEBUGGING_COMMAND = "state-pusher-script"; String DEBUGGING_NAME = "Setup Hadoop Debugging"; StepFactory stepFactory = new StepFactory(); StepConfig enabledebugging = new StepConfig() .withName(DEBUGGING_NAME) .withActionOnFailure(ActionOnFailure.TERMINATE_CLUSTER) .withHadoopJarStep(new HadoopJarStepConfig() .withJar(COMMAND_RUNNER) .withArgs(DEBUGGING_COMMAND)); RunJobFlowRequest request = new RunJobFlowRequest() .withName("Hive Interactive") .withReleaseLabel("emr-4.1.0") .withSteps(enabledebugging) .withApplications(myApp) .withLogUri("s3://myawsbucket/") .withServiceRole("service_role") .withJobFlowRole("jobflow_role") .withInstances(new JobFlowInstancesConfig() .withEc2KeyName("keypair") .withInstanceCount(5) .withKeepJobFlowAliveWhenNoSteps(true) .withMasterInstanceType("m3.xlarge") .withSlaveInstanceType("m1.large")); RunJobFlowResult result = emr.runJobFlow(request);

At minimum, you must pass a service role and jobflow role corresponding to EMR_DefaultRole and EMR_EC2_DefaultRole, respectively. You can do this by invoking this AWS CLI command for the same account. First, look to see if the roles already exist:

aws iam list-roles | grep EMR

Both the instance profile (EMR_EC2_DefaultRole) and the service role (EMR_DefaultRole) will be displayed if they exist:

"RoleName": "EMR_DefaultRole", "Arn": "arn:aws:iam::AccountID:role/EMR_DefaultRole" "RoleName": "EMR_EC2_DefaultRole", "Arn": "arn:aws:iam::AccountID:role/EMR_EC2_DefaultRole"

If the default roles do not exist, you can use the following AWS CLI command to create them:

aws emr create-default-roles