Amazon EMR
Management Guide

Configure IAM Roles for EMRFS Requests to Amazon S3

If you have clusters with multiple users who need different levels of access to data in Amazon S3 through EMRFS, you can set up a security configuration to have EMRFS assume different IAM roles based on the user or group making the request, or based on the location of the data in Amazon S3. Each IAM role can have different permissions for data access in Amazon S3.

This feature is available with Amazon EMR release version 5.10.0 and later. If you use an earlier release version or have requirements beyond what IAM roles for EMRFS provide, you can create a custom credentials provider instead. For more information, see Authorizing Access to EMRFS Data in Amazon S3. For more information about EMRFS, see Use EMR File System (EMRFS).

How IAM Roles for EMRFS Work

By default, when a cluster application makes a request to Amazon S3 through EMRFS, EMRFS uses the permissions policies attached to the EMR role for EC2 that the cluster uses, regardless of the user or group using the application or the location of the data in Amazon S3.

When you use a security configuration to specify IAM roles for EMRFS, you set up role mappings. Each role mapping specifies an IAM role that corresponds to identifiers, which determine the basis for access to Amazon S3 through EMRFS. The identifiers can be users, groups, or Amazon S3 prefixes that indicate a data location. When EMRFS makes a request to Amazon S3 from a cluster that uses the security configuration, if the request matches the basis for access, EMRFS has cluster EC2 instances assume the corresponding IAM role for the request, and the IAM permissions attached to that role apply instead of the IAM permissions attached to the EMR role for EC2.

The users and groups in a role mapping are Hadoop users and groups that are defined on the cluster. Users and groups are passed to EMRFS in the context of the application using it (for example, YARN user impersonation). The Amazon S3 prefix can be a bucket specifier of any depth (for example, s3://mybucket or s3://mybucket/myproject/mydata). You can specify multiple identifiers within a single role mapping, but they all must be of the same type.

When a cluster application makes a request to Amazon S3 through EMRFS, EMRFS evaluates role mappings in the top-down order that they appear in the security configuration. If a request made through EMRFS doesn’t match any identifier, EMRFS falls back to using the EMR role for EC2. For this reason, we recommend that the policies attached to this role limit permissions to Amazon S3.

Set Up a Security Configuration with IAM Roles for EMRFS

Before you set up a security configuration with IAM roles for EMRFS, plan and create the roles and permission policies to attach to the roles. For more information, see How Do Roles for EC2 Instances Work? in the IAM User Guide. When creating permissions policies, we recommend that you start with the managed policy attached to the default EMR role for EC2, which is AmazonElasticMapReduceforEC2Role, and edit this policy according to your requirements. For more information, see Use Default IAM Roles and Managed Policies. If a role allows access to a location in Amazon S3 that is encrypted using an AWS Key Management Service customer master key (CMK), make sure that the role is specified as a key user. This gives the role permission to use the CMK. For more information, see Using Key Policies in the AWS Key Management Service Developer Guide.

Important

If none of the IAM roles for EMRFS that you specify apply, EMRFS falls back to the EMR role for EC2. Consider customizing this role to restrict permissions to Amazon S3 as appropriate for your application and then specifying this custom role instead of EMR_EC2_DefaultRole when you create a cluster. For more information, see Customize IAM Roles and Specify Custom IAM Roles When You Create a Cluster.

To specify IAM roles for EMRFS requests to Amazon S3 using the console

  1. Create a security configuration that specifies role mappings:

    1. In the Amazon EMR console, select Security configurations, Create.

    2. Type a Name for the security configuration. You use this name to specify the security configuration when you create a cluster.

    3. Choose Use IAM roles for EMRFS requests to Amazon S3.

    4. Select an IAM role to apply, and under Basis for access select an identifier type (Users, Groups, or S3 prefixes) from the list and enter corresponding identifiers. If you use multiple identifiers, separate them with a comma and no space. For more information about each identifier type, see the JSON configuration reference below.

    5. Choose Add role to set up additional role mappings as described in the previous step.

    6. Set up other security configuration options as appropriate and choose Create. For more information, see Create a Security Configuration.

  2. Specify the security configuration you created above when you create a cluster. For more information, see Specify a Security Configuration for a Cluster.

To specify IAM roles for EMRFS requests to Amazon S3 using the AWS CLI

  1. Use the aws emr create-security-configuration command, specifying a name for the security configuration, and the security configuration details in JSON format.

    The example command shown below creates a security configuration with the name EMRFS_Roles_Security_Configuration. It is based on a JSON structure in the file MyEmrfsSecConfig.json, which is saved in the same directory where the command is executed.

    aws emr create-security-configuration --name EMRFS_Roles_Security_Configuration --security-configuration file://MyEmrFsSecConfig.json.

    Use the following guidelines for the structure of the MyEmrFsSecConfig.json file. You can specify this structure along with structures for other security configuration options. For more information, see Create a Security Configuration.

    The following is an example JSON snippet for specifying custom IAM roles for EMRFS within a security configuration. It demonstrates role mappings for the three different identifier types, followed by a parameter reference.

    { "AuthorizationConfiguration": { "EmrFsConfiguration": { "RoleMappings": [{ "Role": "arn:aws:iam::123456789101:role/allow_EMRFS_access_for_user1", "IdentifierType": "User", "Identifiers": [ "user1" ] },{ "Role": "arn:aws:iam::123456789101:role/allow_EMRFS_access_to_MyBuckets", "IdentifierType": "Prefix", "Identifiers": [ "s3://MyBucket/","s3://MyOtherBucket/" ] },{ "Role": "arn:aws:iam::123456789101:role/allow_EMRFS_access_for_AdminGroup", "IdentifierType": "Group", "Identifiers": [ "AdminGroup" ] }] } } }
    Parameter Description

    "AuthorizationConfiguration":

    Required.

    "EmrFsConfiguration":

    Required. Contains role mappings.

      "RoleMappings":

    Required. Contains one or more role mapping definitions. Role mappings are evaluated in the top-down order that they appear. If a role mapping evaluates as true for an EMRFS call for data in Amazon S3, no further role mappings are evaluated and EMRFS uses the specified IAM role for the request. Role mappings consist of the following required parameters:

       "Role":

    Specifies the ARN identifier of an IAM role in the format arn:aws:iam::account-id:role/role-name. This is the IAM role that Amazon EMR assumes if the EMRFS request to Amazon S3 matches any of the Identifiers specified.

       "IdentifierType":

    Can be one of the following:

    • "User" specifies that the identifiers are one or more Hadoop users, which can be Linux account users or Kerberos principals. When the EMRFS request originates with the user or users specified, the IAM role is assumed.

    • "Prefix" specifies that the identifier is an Amazon S3 location. The IAM role is assumed for calls to the location or locations with the specified prefixes. For example, the prefix s3://mybucket/ matches s3://mybucket/mydir and s3://mybucket/yetanotherdir.

    • "Group" specifies that the identifiers are one or more Hadoop groups. The IAM role is assumed if the request originates from a user in the specified group or groups.

       "Identifiers":

    Specifies one or more identifiers of the appropriate identifier type. Separate multiple identifiers by commas with no spaces.

  2. Use the aws emr create-cluster command to create a cluster and specify the security configuration you created in the previous step.

    The following example creates a cluster with default core Hadoop applications installed. The cluster uses the security configuration created above as ERMFS_Roles_Security_Configuration and also uses a custom EMR role for EC2, EC2_Role_EMR_Restrict_S3, which is specified using the InstanceProfile argument of the --ec2-attributes parameter.

    Note

    Linux line continuation characters (\) are included for readability. They can be removed or used in Linux commands. For Windows, remove them or replace with a caret (^).

    aws emr create-cluster --name MyEmrFsS3RolesCluster \ --release-label emr-5.17.0 --ec2-attributes InstanceProfile=EC2_Role_EMR_Restrict_S3,KeyName=MyKey \ --instance-type m4.large --instance-count 3 \ --security-configuration EMRFS_Roles_Security_Configuration