Menu
Amazon EMR
Amazon EMR Release Guide

Specifying Amazon S3 Encryption with EMRFS Using a Cluster Configuration

When you create a cluster, you can specify Amazon S3 server-side encryption (SSE) or client-side encryption (CSE) using the emrfs-site classification. Amazon S3 SSE and CSE are mutually exclusive; you can choose either but not both. For more information about Amazon S3 encryption options, see Amazon S3 Server-Side Encryption. Beginning with Amazon EMR release 4.8.0, you can use security configurations to apply encryption settings more easily and with more options.

Important

Although you can still use cluster configurations to apply encryption with current versions of Amazon EMR, it is not recommended. If you configure Amazon S3 encryption in the cluster configuration and in a security configuration, the security configuration overrides the cluster configuration.

For information about how to create security configurations, see Amazon EMR Data Encryption with Security Configurations.

Specifying Amazon S3 Server-Side Encryption

Amazon EMR supports server-side encryption with Amazon S3-provided keys (SSE-S3) and with AWS KMS-managed encryption keys (SSE-KMS). Amazon EMR does not support the Amazon S3 option to use SSE with customer-provided encryption keys (SSE-C). For more information about these options, see At-Rest Encryption for Amazon S3 with EMRFS.

Creating a Cluster with Amazon S3 SSE-S3 Enabled

To configure SSE-S3 as part of a cluster configuration, you can use the AWS Management Console or the AWS CLI. You can also use the configure-hadoop bootstrap action to set fs.s3.enableServerSideEncryption to true.

Note

The following AWS Management Console procedure is not available beginning with Amazon EMR version 4.8.0. Use a security configuration to specify encryption options. For more information, see Specifying a Security Configuration Using the Console.

To create a cluster with SSE-S3 enabled using the console

  1. Choose Create Cluster.

  2. Navigate to the File System Configuration section.

  3. To use Server-side encryption, choose Enabled.

  4. Choose Create cluster.

To create a cluster with SSE-S3 enabled using the AWS CLI

  • Type the following command:

    Copy
    aws emr create-cluster --release-label \ --instance-count 3 --instance-type m1.large --emrfs Encryption=ServerSide

Creating a Cluster with Amazon S3 SSE-KMS Enabled

To configure SSE-KMS as part of a cluster configuration, you must use the AWS CLI or the AWS SDKs. There is no SSE-KMS configuration for Amazon EMR using the AWS Management Console. You enable SSE-KMS much the same as you do for SSE-S3, but you also provide an AWS KMS CMK ID or ARN (Amazon Resource Name) using the fs.s3.serverSideEncryption.kms.keyId setting in the emrfs-site configuration classification.

To create a cluster with SSE-KMS enabled using the AWS CLI

  • Type the following AWS CLI command to create a cluster with SSE-KMS, where keyID is an AWS KMS customer master key (CMK):

    Copy
    aws emr create-cluster --release-label emr-4.5.0 --instance-count 3 \ --instance-type m1.xlarge --use-default-roles \ --emrfs Encryption=ServerSide,Args=[fs.s3.serverSideEncryption.kms.keyId=keyId]

    --OR--

    Type the following AWS CLI command using the configuration API and providing a configuration JSON file with contents as shown (myConfig.json in the example):

    Copy
    aws emr create-cluster --release-label emr-4.5.0 --instance-count 3 \ --instance-type m1.xlarge --applications Name=Hadoop \ --configurations file://./myConfig.json --use-default-roles

    Note

    Linux line continuation characters (\) are included for readability. They can be removed or used in Linux commands. For Windows, remove them or replace with a caret (^).

    myConfig.json

    Copy
    [ { "Classification":"emrfs-site", "Properties": { "fs.s3.enableServerSideEncryption": "true", "fs.s3.serverSideEncryption.kms.keyId":"a4567b8-9900-12ab-1234-123a45678901" } } ]

emrfs-site.xml Properties for SSE-S3 and SSE-KMS

Property Default value Description
fs.s3.enableServerSideEncryption false

When set to true, objects stored in Amazon S3 are encrypted using server-side encryption. If no key is specified, SSE-S3 is used.

fs.s3.serverSideEncryption.kms.keyId n/a

Specifies an AWS KMS key ID or ARN. If a key is specified, SSE-KMS is used.

Specifying Amazon S3 Client-Side Encryption

Amazon EMR supports Amazon S3 client-side encryption (CSE) using an AWS KMS-managed CMK or using a custom client-side master key you provide in a Java class implementation. For more information about Amazon S3 CSE, see Protecting Data Using Client-Side Encryption in the Amazon Simple Storage Service Developer Guide.

Enabling Amazon S3 Client-Side Encryption in the Console

To configure client-side encryption using the console

  1. Choose Create Cluster.

  2. Fill in the fields as appropriate for Cluster Configuration and Tags.

  3. For the Software Configuration field, choose AMI 3.6.0 or later.

  4. In the File System Configuration section, select one of the following client-side encryption types for the Encryption field: S3 client-side encryption with AWS Key Management Service (KMS) or S3 client-side encryption with custom encryption materials provider.

    1. If you chose S3 client-side encryption with AWS Key Management Service (KMS), select the master key alias from the list of master keys that you have previously configured. Alternately, you can choose Enter a Key ARN and enter the ARN of an AWS KMS master key that belongs to a different account, provided that you have permissions to use that key. If you have assigned an instance profile to your EMR cluster, make sure that the role in that profile has permissions to use the key.

    2. If you chose S3 client-side encryption with custom encryption materials provider, provide the full class name and Amazon S3 location of your EncryptionMaterialsProvider class. Amazon EMR automatically downloads your provider to each node in your cluster when it is created.

  5. Fill in the fields as appropriate for Hardware Configuration, Security and Access, Bootstrap Actions, and Steps.

  6. Choose Create cluster.

Selecting a Master Key Stored in AWS KMS Using an SDK or CLI

When you enable Amazon S3 client-side encryption and specify keys stored in AWS KMS, you provide the KeyId value, key alias, or ARN of the key that Amazon EMR uses to encrypt objects written to Amazon S3. For decryption, EMRFS tries to access whichever key encrypted the object. You create the key using the IAM console, AWS CLI, or the AWS SDKs.

If you have assigned an instance profile to your EMR cluster, make sure that the role in that profile has permission to use the key. AWS KMS charges apply for API calls during each encryption or decryption activity, and for storing your key. For more information, see the AWS KMS pricing page.

To use an AWS KMS master key for Amazon S3 encryption, provide the master key by reference using any of three possible identifiers:

  • KeyId (a 32-character GUID)

  • Alias mapped to the KeyId value (you must include the alias/ prefix in this value)

  • Full ARN of the key, which includes the region, account ID, and KeyId value

MyKMSKeyId in the example below can be any of the three values:

Copy
aws emr create-cluster --release-label \ --emrfs Encryption=ClientSide,ProviderType=KMS,KMSKeyId=MyKMSKeyId

Note

Linux line continuation characters (\) are included for readability. They can be removed or used in Linux commands. For Windows, remove them or replace with a caret (^).

Note

You must use the ARN of the AWS KMS master key to use a key owned by an account different than the one you are using to configure Amazon EMR.

Configuring Amazon S3 Client-Side Encryption Using a Custom Provider

To use the AWS CLI, pass the Encryption, ProviderType, CustomProviderClass, and CustomProviderLocation arguments to the emrfs option.

Copy
aws emr create-cluster --instance-type m3.xlarge --release-label \ --emrfs Encryption=ClientSide,ProviderType=Custom,CustomProviderLocation=s3://mybucket/myfolder/provider.jar,CustomProviderClass=classname

Note

Linux line continuation characters (\) are included for readability. They can be removed or used in Linux commands. For Windows, remove them or replace with a caret (^).

Setting Encryption to ClientSide enables client-side encryption, CustomProviderClass is the name of your EncryptionMaterialsProvider object, and CustomProviderLocation is the local or Amazon S3 location from which Amazon EMR copies CustomProviderClass to each node in the cluster and places it in the classpath.

Copy
aws emr create-cluster --ami-version 3.10.0 --instance-type m3.xlarge --instance-count 2 \ --emrfs Encryption=ClientSide,CustomProviderLocation=s3://mybucket/myfolder/myprovider.jar,CustomProviderClass=classname \ --bootstrap-action Path=s3://elasticmapreduce/bootstrap-actions/configure-hadoop,Args=[-e,myProvider.arg1=value1,-e,myProvider.arg2=value2]

Custom EncryptionMaterialsProvider with Arguments

You may need to pass arguments directly to the provider, so you can use a configuration to supply arguments using emrfs-site.xml. Here is the configuration:

Copy
[ { "Classification": "emrfs-site", "Properties": { "myProvider.arg1":"value1", "myProvider.arg2":"value2" } } ]

Then, use the configuration with the AWS CLI:

Copy
aws emr create-cluster --release-label \ --instance-type m3.xlarge --instance-count 2 --configurations file://./myConfig.json --emrfs Encryption=ClientSide,CustomProviderLocation=s3://mybucket/myfolder/myprovider.jar,CustomProviderClass=classname

To use an SDK, you can set the property fs.s3.cse.encryptionMaterialsProvider.uri to download the custom EncryptionMaterialsProvider class that you store in Amazon S3 to each node in your cluster. You configure this in emrfs-site.xml file along with CSE enabled and the proper location of the custom provider.

For example, in the AWS SDK for Java using RunJobFlowRequest, your code might look like the following:

Copy
<snip> Map<String,String> emrfsProperties = new HashMap<String,String>(); emrfsProperties.put("fs.s3.cse.encryptionMaterialsProvider.uri","s3://mybucket/MyCustomEncryptionMaterialsProvider.jar"); emrfsProperties.put("fs.s3.cse.enabled","true"); emrfsProperties.put("fs.s3.consistent","true"); emrfsProperties.put("fs.s3.cse.encryptionMaterialsProvider","full.class.name.of.EncryptionMaterialsProvider"); Configuration myEmrfsConfig = new Configuration() .withClassification("emrfs-site") .withProperties(emrfsProperties); RunJobFlowRequest request = new RunJobFlowRequest() .withName("Custom EncryptionMaterialsProvider") .withReleaseLabel("") .withApplications(myApp) .withConfigurations(myEmrfsConfig) .withServiceRole("EMR_DefaultRole") .withJobFlowRole("EMR_EC2_DefaultRole") .withLogUri("s3://myLogUri/") .withInstances(new JobFlowInstancesConfig() .withEc2KeyName("myEc2Key") .withInstanceCount(2) .withKeepJobFlowAliveWhenNoSteps(true) .withMasterInstanceType("m3.xlarge") .withSlaveInstanceType("m3.xlarge") ); RunJobFlowResult result = emr.runJobFlow(request); </snip>

For more information about a list of configuration key values to use to configure emrfs-site.xml, see emrfs-site.xml Properties for SSE-S3 and SSE-KMS.

Reference Implementation of Amazon S3 EncryptionMaterialsProvider

When fetching the encryption materials from the EncryptionMaterialsProvider class to perform encryption, EMRFS optionally populates the materialsDescription argument with two fields: the Amazon S3 URI for the object and the JobFlowId of the cluster, which can be used by the EncryptionMaterialsProvider class to return encryption materials selectively. You can enable this behavior by setting fs.s3.cse.materialsDescription.enabled to true in emrfs-site.xml. For example, the provider may return different keys for different Amazon S3 URI prefixes. It is the description of the returned encryption materials that is eventually stored with the Amazon S3 object rather than the materialsDescription value that is generated by EMRFS and passed to the provider. While decrypting an Amazon S3 object, the encryption materials description is passed to the EncryptionMaterialsProvider class, so that it can, again, selectively return the matching key to decrypt the object.

The following EncryptionMaterialsProvider reference implementation is provided below. Another custom provider, EMRFSRSAEncryptionMaterialsProvider, is available from GitHub.

Copy
import com.amazonaws.services.s3.model.EncryptionMaterials; import com.amazonaws.services.s3.model.EncryptionMaterialsProvider; import com.amazonaws.services.s3.model.KMSEncryptionMaterials; import org.apache.hadoop.conf.Configurable; import org.apache.hadoop.conf.Configuration; import java.util.Map; /** * Provides KMSEncryptionMaterials according to Configuration */ public class MyEncryptionMaterialsProviders implements EncryptionMaterialsProvider, Configurable{ private Configuration conf; private String kmsKeyId; private EncryptionMaterials encryptionMaterials; private void init() { this.kmsKeyId = conf.get("my.kms.key.id"); this.encryptionMaterials = new KMSEncryptionMaterials(kmsKeyId); } @Override public void setConf(Configuration conf) { this.conf = conf; init(); } @Override public Configuration getConf() { return this.conf; } @Override public void refresh() { } @Override public EncryptionMaterials getEncryptionMaterials(Map<String, String> materialsDescription) { return this.encryptionMaterials; } @Override public EncryptionMaterials getEncryptionMaterials() { return this.encryptionMaterials; } }

emrfs-site.xml Properties for Amazon S3 Client-Side Encryption

Property Default value Description
fs.s3.cse.enabled false

When set to true, objects stored in Amazon S3 are encrypted using client-side encryption.

fs.s3.cse.encryptionMaterialsProvider.uri N/A The Amazon S3 URI where the JAR with the EncryptionMaterialsProvider is located. When you provide this URI, Amazon EMR automatically downloads the JAR to all nodes in the cluster.
fs.s3.cse.encryptionMaterialsProvider N/A

The EncryptionMaterialsProvider class path used with client-side encryption.

Note

For AWS KMS, use com.amazon.ws.emr.hadoop.fs.cse.KMSEncryptionMaterialsProvider.

fs.s3.cse.materialsDescription.enabled false

Enabling populates the materialsDescription of encrypted objects with the Amazon S3 URI for the object and the JobFlowId.

fs.s3.cse.kms.keyId N/A

The value of the KeyId field for the AWS KMS encryption key that you are using with EMRFS encryption.

Note

This property also accepts the ARN and key alias associated with the key.

fs.s3.cse.cryptoStorageMode ObjectMetadata

The Amazon S3 storage mode. By default, the description of the encryption information is stored in the object metadata. You can also store the description in an instruction file. Valid values are ObjectMetadata and InstructionFile. For more information, see Client-Side Data Encryption with the AWS SDK for Java and Amazon S3.