Configure IAM roles for EMRFS requests to Amazon S3 - Amazon EMR

Configure IAM roles for EMRFS requests to Amazon S3

Note

The EMRFS role mapping capability described on this page has been improved upon with the introduction of Amazon S3 Access Grants in Amazon EMR 6.15.0. For a scalable access control solution for your data in Amazon S3, we recommend that you use S3 Access Grants with Amazon EMR.

When an application running on a cluster references data using the s3://mydata format, Amazon EMR uses EMRFS to make the request. To interact with Amazon S3, EMRFS assumes the permissions policies that are attached to your Amazon EC2 instance profile. The same Amazon EC2 instance profile is used regardless of the user or group running the application or the location of the data in Amazon S3.

If you have a cluster with multiple users who need different levels of access to data in Amazon S3 through EMRFS, you can set up a security configuration with IAM roles for EMRFS. EMRFS can assume a different service role for cluster EC2 instances based on the user or group making the request, or based on the location of data in Amazon S3. Each IAM role for EMRFS can have different permissions for data access in Amazon S3. For more information about the service role for cluster EC2 instances, see Service role for cluster EC2 instances (EC2 instance profile).

Using custom IAM roles for EMRFS is supported in Amazon EMR versions 5.10.0 and later. If you use an earlier version or have requirements beyond what IAM roles for EMRFS provide, you can create a custom credentials provider instead. For more information, see Authorizing access to EMRFS data in Amazon S3.

When you use a security configuration to specify IAM roles for EMRFS, you set up role mappings. Each role mapping specifies an IAM role that corresponds to identifiers. These identifiers determine the basis for access to Amazon S3 through EMRFS. The identifiers can be users, groups, or Amazon S3 prefixes that indicate a data location. When EMRFS makes a request to Amazon S3, if the request matches the basis for access, EMRFS has cluster EC2 instances assume the corresponding IAM role for the request. The IAM permissions attached to that role apply instead of the IAM permissions attached to the service role for cluster EC2 instances.

The users and groups in a role mapping are Hadoop users and groups that are defined on the cluster. Users and groups are passed to EMRFS in the context of the application using it (for example, YARN user impersonation). The Amazon S3 prefix can be a bucket specifier of any depth (for example, s3://mybucket or s3://mybucket/myproject/mydata). You can specify multiple identifiers within a single role mapping, but they all must be of the same type.

Important

IAM roles for EMRFS provide application-level isolation between users of the application. It does not provide host level isolation between users on the host. Any user with access to the cluster can bypass the isolation to assume any of the roles.

When a cluster application makes a request to Amazon S3 through EMRFS, EMRFS evaluates role mappings in the top-down order that they appear in the security configuration. If a request made through EMRFS doesn't match any identifier, EMRFS falls back to using the service role for cluster EC2 instances. For this reason, we recommend that the policies attached to this role limit permissions to Amazon S3. For more information, see Service role for cluster EC2 instances (EC2 instance profile).

Configure roles

Before you set up a security configuration with IAM roles for EMRFS, plan and create the roles and permission policies to attach to the roles. For more information, see How do roles for EC2 instances work? in the IAM User Guide. When creating permissions policies, we recommend that you start with the managed policy attached to the default Amazon EMR role for EC2, and then edit this policy according to your requirements. The default role name is EMR_EC2_DefaultRole, and the default managed policy to edit is AmazonElasticMapReduceforEC2Role. For more information, see Service role for cluster EC2 instances (EC2 instance profile).

Updating trust policies to assume role permissions

Each role that EMRFS uses must have a trust policy that allows the cluster's Amazon EMR role for EC2 to assume it. Similarly, the cluster's Amazon EMR role for EC2 must have a trust policy that allows EMRFS roles to assume it.

The following example trust policy is attached to roles for EMRFS. The statement allows the default Amazon EMR role for EC2 to assume the role. For example, if you have two fictitious EMRFS roles, EMRFSRole_First and EMRFSRole_Second, this policy statement is added to the trust policies for each of them.

{ "Version":"2012-10-17", "Statement":[ { "Effect":"Allow", "Principal":{ "AWS":"arn:aws:iam::AWSAcctID:role/EMR_EC2_DefaultRole" }, "Action":"sts:AssumeRole" } ] }

In addition, the following example trust policy statement is added to the EMR_EC2_DefaultRole to allow the two fictitious EMRFS roles to assume it.

{ "Version":"2012-10-17", "Statement":[ { "Effect":"Allow", "Principal":{ "AWS": ["arn:aws:iam::AWSAcctID:role/EMRFSRole_First", "arn:aws:iam::AWSAcctID:role/EMRFSRole_Second"] }, "Action":"sts:AssumeRole" } ] }
To update the trust policy of an IAM role

Open the IAM console at https://console.aws.amazon.com/iam/.

  1. Choose Roles, enter the name of the role in Search, and then select its Role name.

  2. Choose Trust relationships, Edit trust relationship.

  3. Add a trust statement according to the Policy document according to the guidelines above, and then choose Update trust policy.

Specifying a role as a key user

If a role allows access to a location in Amazon S3 that is encrypted using an AWS KMS key, make sure that the role is specified as a key user. This gives the role permission to use the KMS key. For more information, see Key policies in AWS KMS in the AWS Key Management Service Developer Guide.

Set up a security configuration with IAM roles for EMRFS

Important

If none of the IAM roles for EMRFS that you specify apply, EMRFS falls back to the Amazon EMR role for EC2. Consider customizing this role to restrict permissions to Amazon S3 as appropriate for your application and then specifying this custom role instead of EMR_EC2_DefaultRole when you create a cluster. For more information, see Customize IAM roles and Specify custom IAM roles when you create a cluster.

To specify IAM roles for EMRFS requests to Amazon S3 using the console
  1. Create a security configuration that specifies role mappings:

    1. In the Amazon EMR console, select Security configurations, Create.

    2. Type a Name for the security configuration. You use this name to specify the security configuration when you create a cluster.

    3. Choose Use IAM roles for EMRFS requests to Amazon S3.

    4. Select an IAM role to apply, and under Basis for access select an identifier type (Users, Groups, or S3 prefixes) from the list and enter corresponding identifiers. If you use multiple identifiers, separate them with a comma and no space. For more information about each identifier type, see the JSON configuration reference below.

    5. Choose Add role to set up additional role mappings as described in the previous step.

    6. Set up other security configuration options as appropriate and choose Create. For more information, see Create a security configuration.

  2. Specify the security configuration you created above when you create a cluster. For more information, see Specify a security configuration for a cluster.

To specify IAM roles for EMRFS requests to Amazon S3 using the AWS CLI
  1. Use the aws emr create-security-configuration command, specifying a name for the security configuration, and the security configuration details in JSON format.

    The example command shown below creates a security configuration with the name EMRFS_Roles_Security_Configuration. It is based on a JSON structure in the file MyEmrfsSecConfig.json, which is saved in the same directory where the command is executed.

    aws emr create-security-configuration --name EMRFS_Roles_Security_Configuration --security-configuration file://MyEmrFsSecConfig.json.

    Use the following guidelines for the structure of the MyEmrFsSecConfig.json file. You can specify this structure along with structures for other security configuration options. For more information, see Create a security configuration.

    The following is an example JSON snippet for specifying custom IAM roles for EMRFS within a security configuration. It demonstrates role mappings for the three different identifier types, followed by a parameter reference.

    { "AuthorizationConfiguration": { "EmrFsConfiguration": { "RoleMappings": [{ "Role": "arn:aws:iam::123456789101:role/allow_EMRFS_access_for_user1", "IdentifierType": "User", "Identifiers": [ "user1" ] },{ "Role": "arn:aws:iam::123456789101:role/allow_EMRFS_access_to_MyBuckets", "IdentifierType": "Prefix", "Identifiers": [ "s3://MyBucket/","s3://MyOtherBucket/" ] },{ "Role": "arn:aws:iam::123456789101:role/allow_EMRFS_access_for_AdminGroup", "IdentifierType": "Group", "Identifiers": [ "AdminGroup" ] }] } } }
    Parameter Description

    "AuthorizationConfiguration":

    Required.

    "EmrFsConfiguration":

    Required. Contains role mappings.

      "RoleMappings":

    Required. Contains one or more role mapping definitions. Role mappings are evaluated in the top-down order that they appear. If a role mapping evaluates as true for an EMRFS call for data in Amazon S3, no further role mappings are evaluated and EMRFS uses the specified IAM role for the request. Role mappings consist of the following required parameters:

       "Role":

    Specifies the ARN identifier of an IAM role in the format arn:aws:iam::account-id:role/role-name. This is the IAM role that Amazon EMR assumes if the EMRFS request to Amazon S3 matches any of the Identifiers specified.

       "IdentifierType":

    Can be one of the following:

    • "User" specifies that the identifiers are one or more Hadoop users, which can be Linux account users or Kerberos principals. When the EMRFS request originates with the user or users specified, the IAM role is assumed.

    • "Prefix" specifies that the identifier is an Amazon S3 location. The IAM role is assumed for calls to the location or locations with the specified prefixes. For example, the prefix s3://mybucket/ matches s3://mybucket/mydir and s3://mybucket/yetanotherdir.

    • "Group" specifies that the identifiers are one or more Hadoop groups. The IAM role is assumed if the request originates from a user in the specified group or groups.

       "Identifiers":

    Specifies one or more identifiers of the appropriate identifier type. Separate multiple identifiers by commas with no spaces.

  2. Use the aws emr create-cluster command to create a cluster and specify the security configuration you created in the previous step.

    The following example creates a cluster with default core Hadoop applications installed. The cluster uses the security configuration created above as EMRFS_Roles_Security_Configuration and also uses a custom Amazon EMR role for EC2, EC2_Role_EMR_Restrict_S3, which is specified using the InstanceProfile argument of the --ec2-attributes parameter.

    Note

    Linux line continuation characters (\) are included for readability. They can be removed or used in Linux commands. For Windows, remove them or replace with a caret (^).

    aws emr create-cluster --name MyEmrFsS3RolesCluster \ --release-label emr-7.0.0 --ec2-attributes InstanceProfile=EC2_Role_EMR_Restrict_S3,KeyName=MyKey \ --instance-type m5.xlarge --instance-count 3 \ --security-configuration EMRFS_Roles_Security_Configuration