Menu
Amazon EMR
Amazon EMR Release Guide

Create an AWSCredentialsProvider for EMRFS

In some cases, you may want to allow users to create an Amazon EMR cluster that accesses and analyzes data saved in Amazon S3, but the data is restricted in a way that makes access using EMRFS difficult. This difficulty arises because the credentials provided by Amazon EMR through the EC2 Instance Profile are not enough to allow access to the data. To provide credentials, you can define a custom credentials provider class, which implements both the AWSCredentialsProvider and the Hadoop Configurable classes. This custom credentials provider is more restrictive than expanding access by modifying Amazon S3 bucket policies or other IAM policies. Creating a custom credentials provider can help ensure that only those EMR clusters configured to use it have access to the data in Amazon S3.

For a detailed explanation of this solution, see Securely Analyze Data from Another AWS Account with EMRFS in the AWS Big Data blog. The blog post includes a tutorial that walks you through the process end-to-end, from creating IAM roles to launching the cluster. It also provides a Java code sample that implements the custom credential provider class.

The basic steps are as follows:

To specify a custom credentials provider

  1. Create a custom credentials provider class compiled as a JAR file.

  2. Run a script as a bootstrap action to copy the custom credentials provider JAR file to the /usr/share/aws/emr/emrfs/auxlib location on the cluster's master node. For more information about bootstrap actions, see (Optional) Create Bootstrap Actions to Install Additional Software.

  3. Customize the emrfs-site classification to specify the class that you implement in the JAR file. For more information about specifying configuration objects to customize applications, see Configuring Applications in the Amazon EMR Release Guide.

    The following example demonstrates a create-cluster command that launches a Hive cluster with common configuration parameters, and also includes:

    • A bootstrap action that runs the script, copy_jar_file.sh, which is saved to mybucket in Amazon S3.

    • An emrfs-site classification that specifies a custom credentials provider defined in the JAR file as MyCustomCredentialsProvider

    Note

    Linux line continuation characters (\) are included for readability. They can be removed or used in Linux commands. For Windows, remove them or replace with a caret (^).

    Copy
    aws emr create-cluster --applications Name=Hive \ --bootstrap-actions '[{"Path":"s3://mybucket/copy_jar_file.sh","Name":"Custom action"}]' \ --ec2-attributes '{"KeyName":"MyKeyPair","InstanceProfile":"EMR_EC2_DefaultRole",\ "SubnetId":"subnet-xxxxxxxx","EmrManagedSlaveSecurityGroup":"sg-xxxxxxxx",\ "EmrManagedMasterSecurityGroup":"sg-xxxxxxxx"}' \ --service-role EMR_DefaultRole --enable-debugging --release-label \ --log-uri 's3n://my-emr-log-bucket/' --name 'test-awscredentialsprovider-emrfs' \ --instance-type=m3.xlarge --instance-count 3 \ --configurations '[{"Classification":"emrfs-site",\ "Properties":{"fs.s3.customAWSCredentialsProvider":"MyAWSCredentialsProviderWithUri"},\ "Configurations":[]}]'