Menu
Amazon EMR
Management Guide

EMRFS Authorization for Data in Amazon S3

EMRFS authorization for data in Amazon S3, available in Amazon EMR release version 5.10.0 and later, gives you a way to provide fine-grained access control by dynamically providing credentials when users access data in Amazon S3 with EMRFS. This is useful for limiting Amazon S3 access for clusters that have multiple users. By default, the policy attached to the EC2 instance profile for Amazon EMR determines the data that can be accessed in Amazon S3, and is used for any EMRFS request to Amazon S3.

EMRFS authorization allows you to specify an AWS Identity and Access Management role to use when certain users access Amazon S3 using EMRFS. In addition, you can specify an IAM role to use when different Amazon S3 prefixes are accessed, which can help provide cross-account and cross-bucket access.

To enable EMRFS authorization, you create a security configuration and specify role mappings. Each role mapping specifies an IAM role along with identifiers. The identifiers can be users, groups, or Amazon S3 prefixes. Any time that EMRFS makes a request to Amazon S3 from a cluster that uses the security configuration, if the request contains the identifier you specify, EMRFS assumes the IAM role that you specify. Amazon EMR uses the Amazon EC2 instance profile on your cluster to assume this role. For more information about the Amazon EC2 instance profile, see Configure IAM Roles for Amazon EMR Permissions to AWS Services.

Role mappings are evaluated in the order that they are presented in the security configuration. If an identifier isn't found in a role mapping, EMRFS uses the IAM role attached to the Amazon EC2 instance profile to access Amazon S3. If possible, we recommend that you limit the Amazon S3 access that this role allows because Amazon EMR uses it for any EMRFS access to Amazon S3 for which an identifier isn't found.

You can create multiple roles, specify multiple identifiers, and specify multiple role mappings. The users and groups in a role mapping are Hadoop users and groups defined on the cluster, which are passed in the context of the application using EMRFS (for example, YARN user impersonation). You can specify multiple identifiers within a single role mapping, but they must all be of the same type.

For more information about creating and specifying a security configuration for a cluster, see Use Security Configurations to Set Up Cluster Security.

The following is an example JSON snippet for EMRFS authorization within a security configuration. It demonstrates role mappings for the three different identifier types, followed by a parameter reference.

Copy
{ "AuthorizationConfiguration": { "EmrfsConfiguration": { "RoleMappings": [{ "Role": "arn:aws:iam::123456789101:role/allow_user1_S3Access", "IdentifierType": "User", "Identifiers": [ "user1" ] },{ "Role": "arn:aws:iam::123456789101:role/allow_EMRFS_access_to_MyBuckets", "IdentifierType": "Prefix", "Identifiers": [ "s3://MyBucket/","s3://MyOtherBucket/" ] },{ "Role": "arn:aws:iam::123456789101:role/allow_AdminGroup_S3Access", "IdentifierType": "Group", "Identifiers": [ "AdminGroup" ] }] } } }
Parameter Description

"AuthorizationConfiguration":

Required for EMRFS authorization. Contains authorization configurations.

"EmrfsConfiguration":

Required for EMRFS authorization. Contains EMRFS authorization configurations.

"RoleMappings":

Required for EMRFS authorization. Contains one or more role mapping definitions. Role mappings are evaluated in the order that they appear, and if a role mapping evaluates as true for an EMRFS call to Amazon S3, no further role mappings are evaluated. Role mappings consist of the following required parameters:

"Role":

Specifies the ARN identifier of an IAM role in the format arn:aws:iam::account-id:role/role-name. This is the IAM role that Amazon EMR assumes if the EMRFS request to Amazon S3 matches any of the Identifiers specified.

"IdentifierType":

Can be one of the following:

  • "User" specifies that the identifiers are one or more Hadoop users, which can be Linux account users or Kerberos principals. When the EMRFS request originates with the user or users specified, the IAM role is assumed.

  • "Prefix" specifies that the identifier is an Amazon S3 location. The IAM role is assumed for calls to the location or locations with the specified prefixes. For example, the prefix s3://mybucket/ matches s3://mybucket/mydir and s3://mybucket/yetanotherdir.

  • "Group" specifies that the identifiers are one or more Hadoop groups. The IAM role is assumed if the request originates from a user in the specified group or groups.

"Identifiers"

Specify identifiers of the appropriate identifier type. Separate multiple identifiers by commas with no spaces.