Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Configure IAM Roles for Amazon EMR

An AWS Identity and Access Management (IAM) role is a way to delegate access so IAM users or services in AWS can act on your AWS resources. You create an IAM role and assign permissions to it, such as the ability to read and write data in one of your Amazon S3 buckets. When an IAM user or a service in AWS assumes that IAM role, they gain the specified permissions to access your AWS resources.

Amazon EMR uses IAM roles so that applications running on the EC2 instances of your cluster can access your AWS resources without the need to distribute your AWS account or IAM user credentials to those EC2 instances. With IAM roles, not only is your account information more secure, but you can refine the IAM roles to limit the actions these applications take on your behalf. For example, with IAM roles, you can grant an application the ability to read from the S3 bucket that contains your input data, but restrict its ability to launch new EC2 instances.

Note

Roles are required in the AWS GovCloud (US) Region. If you are launching a cluster into the AWS GovCloud (US) Region, for security purposes, you must launch the cluster in an Amazon Virtual Private Cloud (Amazon VPC) and specify an IAM role. If you are not launching the cluster in the AWS GovCloud (US) Region, roles are optional.

For more information about IAM roles, go to Delegating API Access by Using Roles.

Launch an Amazon EMR Cluster with an IAM Role

The following versions of Amazon EMR components are required to use IAM roles:

  • AMI version 2.3.0 or later.

  • If you are using Hive, version 0.8.1.6 or later.

  • If you are using the CLI, version 2012-12-17 or later.

  • If you are using s3DistCP, use the version at s3://elasticmapreduce/libs/s3distcp/role/s3distcp.jar.

The IAM user creating Amazon EMR clusters needs permissions to retrieve and assign roles to the Amazon EC2 instances. If the user lacks these IAM permissions, you get the error User account is not authorized to call EC2. The following IAM user policy allows the IAM user to create an EMR cluster:

{
"Version": "2012-10-17",   
"Statement": [
    {
     "Action": [
        "elasticmapreduce:*",
        "ec2:*",
        "cloudwatch:*",
        "s3:*",
        "sdb:*",
        "iam:AddRoleToInstanceProfile",
        "iam:PassRole",
        "iam:GetInstanceProfile",
        "iam:GetRole"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}

To launch a cluster with an IAM role using the CLI

  • Add the --jobflow-role parameter to the command that creates the cluster and specify the name of the IAM role to apply to the EC2 instances in the cluster. The following example shows how to create an interactive Hive cluster that uses the default IAM role provided by Amazon EMR.

    In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce --create --alive --num-instances 3 \
      --instance-type m1.small \
      --name "myJobFlowName" \
      --hive-interactive --hive-versions 0.8.1.6 \
      --ami-version 2.3.0 \
      --jobflow-role EMRJobflowDefault
    • Windows users:

      ruby elastic-mapreduce --create --alive --num-instances 3 --instance-type m1.small --name "myJobFlowName" --hive-interactive --hive-versions 0.8.1.6 --ami-version 2.3.0 --jobflow-role EMRJobflowDefault

To set a default IAM role for the CLI

  • If you launch most or all of your clusters with a specific IAM role, you can set that IAM role as the default for the CLI, so you don't need to specify it at the command line. Add a jobflow-role field in the credentials.json file you created when you installed the CLI.

    For more information about credentials.json, see Configuring Credentials.

    The following example shows the contents of a credentials.json file that causes the CLI to always launch clusters with a user-defined IAM role, MyCustomRole.

    {
    "access-id": "AccessKeyID",
    "private-key": "PrivateKey",
    "key-pair": "KeyName",
    "jobflow-role": "MyCustomRole",
    "key-pair-file": "location of key pair file",
    "region": "Region",
    "log-uri": "location of bucket on Amazon S3"
    }
                    

    You can override the IAM role specified in credentials.json at any time by specifying a different IAM role at the command line as shown in the preceding procedure.

To launch a cluster with an IAM role using the API

  • Add a JobFlowRole argument to the call to the RunJobFlow action that specifies the name of the IAM role. This is shown in the following example, which sets the IAM role for the cluster to EMRJobflowDefault.

    https://elasticmapreduce.amazonaws.com?Action=RunJobFlow
    &Name=MyJobFlowName 
    &LogUri=s3n%3A%2F%2Fmybucket%2Fsubdir
    &Instances.MasterInstanceType=m1.small 
    &Instances.SlaveInstanceType=m1.small
    &Instances.InstanceCount=4 
    &Instances.Ec2KeyName=myec2keyname
    &Instances.Placement.AvailabilityZone=us-east-1a
    &Instances.KeepJobFlowAliveWhenNoSteps=true 
    &Instances.TerminationProtected=true
    &Steps.member.1.Name=MyStepName
    &Steps.member.1.ActionOnFailure=CONTINUE
    &Steps.member.1.HadoopJarStep.Jar=MyJarFile
    &Steps.member.1.HadoopJarStep.MainClass=MyMainClass
    &Steps.member.1.HadoopJarStep.Args.member.1=arg1
    &Steps.member.1.HadoopJarStep.Args.member.2=arg2 
    &JobFlowRole=EMRJobflowDefault
    &AuthParams
                    

If you do not specify the name of a role when you launch the cluster, the cluster is launched without roles enabled, and any applications on the cluster that need to access AWS resources must use pre-roles authentication methods.

EMRJobflowDefault IAM Role

To simplify using IAM roles, Amazon EMR provides a default IAM role called EMRJobflowDefault. If you launch a cluster using the CLI and specify the IAM role as EMRJobflowDefault, the CLI will check and see if a IAM role with that name already exists for your account. If not, it will create the IAM role on your behalf.

If you are using an IAM user with the CLI, your IAM user must have iam:CreateRole, iam:PutRolePolicy, iam:CreateInstanceProfile, iam:AddRoleToInstanceProfile, iam:PassRole, and iam:ListInstanceProfiles permissions for the CLI to succeed in creating the default IAM role and launching the cluster with that IAM role. However, to use a pre-existing IAM role, a user only needs iam:GetInstanceProfile, iam:GetRole, iam:PassRole, and iam:ListInstanceProfiles permissions.

The permissions set in the automatically generated EMRJobflowDefault IAM role are as follows.

{
"Version": "2012-10-17",    
"Statement": [
    {
      "Action": [
        "cloudwatch:*",
        "dynamodb:*",
        "ec2:Describe*",
        "elasticmapreduce:Describe*",
        "rds:Describe*",
        "s3:*",
        "sdb:*",
        "sns:*",
        "sqs:*"
      ],
      "Effect": "Allow",
      "Resource": [
        "*"
         ]
}
]
}

This set of permissions provides applications running on your cluster access to the full functionality of Amazon EMR, CloudWatch, Amazon S3, Amazon RDS, and DynamoDB. It also provides access to a subset of the functionality of Amazon EC2, that is, the set of actions required by Hadoop to process clusters.

If your application doesn't require access to all of the services listed earlier, you can create a custom IAM role to use when launching clusters that is limited to just the access your application requires. For information on how to do that, see Custom IAM Roles.

Custom IAM Roles

If the default IAM role provided by Amazon EMR, EMRJobflowDefault, does not meet your needs, you can create a custom IAM role and use that instead. For example, if your application does not access DynamoDB, you should remove DynamoDB permissions in your custom IAM role. Creating and managing IAM roles is described in the AWS Identity and Access Management documentation.

We recommend that you use the permissions in EMRJobflowDefault as a starting place when developing a custom IAM role to use with Amazon EMR. To ensure that you always have access to the original version of this IAM role, we recommend that you generate EMRJobflowDefault using the Amazon EMR CLI, copy the contents of EMRJobflowDefault, create a new IAM role, paste in the permissions, and modify those.

The following is an example of a custom IAM role for use with Amazon EMR. This example is for a cluster that does not use Amazon RDS, or DynamoDB.

The access to Amazon SimpleDB is included to permit debugging from the console. Access to CloudWatch is included so the cluster can report metrics. Amazon SNS and Amazon SQS permissions are included for messaging.

{
"Version": "2012-10-17",  
"Statement": [
    {
      "Action": [
        "cloudwatch:*",
        "ec2:Describe*",
        "elasticmapreduce:Describe*",
        "s3:*",
        "sdb:*",
        "sns:*",
        "sqs:*"
      ],
      "Effect": "Allow",
      "Resource": [
        "*"
         ]
}
]
}
    

Important

If you use the IAM CLI or API to create a IAM role and its associated instance profile, and give the instance profile a different name than the IAM role, you should use the name of the instance profile, not the name of the IAM role, when specifying a IAM role to use in an Amazon EMR cluster. For simplicity, we recommend you give a new IAM role the same name as its associated instance profile. For more information about instance profiles, go to Instance Profiles.

Access AWS Resources Using IAM Roles

If you've launched your cluster with an IAM role, applications running on the EC2 instances of that cluster can use the IAM role to obtain temporary account credentials to use when calling services in AWS.

The version of Hadoop available on AMI 2.3.0 and later has already been updated to make use of IAM roles. If your application runs strictly on top of the Hadoop architecture, and does not directly call any service in AWS, it should work with IAM roles with no modification.

If your application calls services in AWS directly, you'll need to update it to take advantage of IAM roles. This means that instead of obtaining account credentials from /home/hadoop/conf/core-site.xml on the EC2 instances in the cluster, your application will now either use an SDK to access the resources using IAM roles, or call the EC2 instance metadata to obtain the temporary credentials.

To access AWS resources with IAM roles using an SDK

To obtain temporary credentials from EC2 instance metadata

  • Call the following URL from an EC2 instance that is running with the specified IAM role. In the example that follows, we've used the default IAM role, EMRJobflowDefault. This URL returns the temporary security credentials (AccessKeyId, SecretAccessKey, SessionToken, and Expiration) associated with the IAM role.

    GET http://169.254.169.254/latest/meta-data/iam/security-credentials/EMRJobflowDefault
                    

For more information about writing applications that use IAM roles, go to Granting Applications that Run on Amazon EC2 Instances Access to AWS Resources.

For more information about how to use temporary security credentials, go to Using Temporary Security Credentials to Access AWS.