Amazon S3 client-side encryption - Amazon EMR

Amazon S3 client-side encryption

With Amazon S3 client-side encryption, the Amazon S3 encryption and decryption takes place in the EMRFS client on your cluster. Objects are encrypted before being uploaded to Amazon S3 and decrypted after they are downloaded. The provider you specify supplies the encryption key that the client uses. The client can use keys provided by AWS KMS (CSE-KMS) or a custom Java class that provides the client-side root key (CSE-C). The encryption specifics are slightly different between CSE-KMS and CSE-C, depending on the specified provider and the metadata of the object being decrypted or encrypted. For more information about these differences, see Protecting data using client-side encryption in the Amazon Simple Storage Service User Guide.

Note

Amazon S3 CSE only ensures that EMRFS data exchanged with Amazon S3 is encrypted; not all data on cluster instance volumes is encrypted. Furthermore, because Hue does not use EMRFS, objects that the Hue S3 File Browser writes to Amazon S3 are not encrypted.

To specify CSE-KMS for EMRFS data in Amazon S3 using the AWS CLI
  • Type the following command and replace MyKMSKeyID with the Key ID or ARN of the KMS key to use:

    aws emr create-cluster --release-label emr-4.7.2 or earlier --emrfs Encryption=ClientSide,ProviderType=KMS,KMSKeyId=MyKMSKeyId

Creating a custom key provider

Depending on the type of encryption you use when creating a custom key provider, the application must also implement different EncryptionMaterialsProvider interfaces. Both interfaces are available in the AWS SDK for Java version 1.11.0 and later.

You can use any strategy to provide encryption materials for the implementation. For example, you might choose to provide static encryption materials or integrate with a more complex key management system.

If you’re using Amazon S3 encryption, you must use the encryption algorithms AES/GCM/NoPadding for custom encryption materials.

If you’re using local disk encryption, the encryption algorithm to use for custom encryption materials varies by EMR release. For Amazon EMR 7.0.0 and lower, you must use AES/GCM/NoPadding. For Amazon EMR 7.1.0 and higher, you must use AES.

The EncryptionMaterialsProvider class gets encryption materials by encryption context. Amazon EMR populates encryption context information at runtime to help the caller determine the correct encryption materials to return.

Example: Using a custom key provider for Amazon S3 encryption with EMRFS

When Amazon EMR fetches the encryption materials from the EncryptionMaterialsProvider class to perform encryption, EMRFS optionally populates the materialsDescription argument with two fields: the Amazon S3 URI for the object and the JobFlowId of the cluster, which can be used by the EncryptionMaterialsProvider class to return encryption materials selectively.

For example, the provider may return different keys for different Amazon S3 URI prefixes. It is the description of the returned encryption materials that is eventually stored with the Amazon S3 object rather than the materialsDescription value that is generated by EMRFS and passed to the provider. While decrypting an Amazon S3 object, the encryption materials description is passed to the EncryptionMaterialsProvider class, so that it can, again, selectively return the matching key to decrypt the object.

An EncryptionMaterialsProvider reference implementation is provided below. Another custom provider, EMRFSRSAEncryptionMaterialsProvider, is available from GitHub.

import com.amazonaws.services.s3.model.EncryptionMaterials; import com.amazonaws.services.s3.model.EncryptionMaterialsProvider; import com.amazonaws.services.s3.model.KMSEncryptionMaterials; import org.apache.hadoop.conf.Configurable; import org.apache.hadoop.conf.Configuration; import java.util.Map; /** * Provides KMSEncryptionMaterials according to Configuration */ public class MyEncryptionMaterialsProviders implements EncryptionMaterialsProvider, Configurable{ private Configuration conf; private String kmsKeyId; private EncryptionMaterials encryptionMaterials; private void init() { this.kmsKeyId = conf.get("my.kms.key.id"); this.encryptionMaterials = new KMSEncryptionMaterials(kmsKeyId); } @Override public void setConf(Configuration conf) { this.conf = conf; init(); } @Override public Configuration getConf() { return this.conf; } @Override public void refresh() { } @Override public EncryptionMaterials getEncryptionMaterials(Map<String, String> materialsDescription) { return this.encryptionMaterials; } @Override public EncryptionMaterials getEncryptionMaterials() { return this.encryptionMaterials; } }

Specifying a custom materials provider using the AWS CLI

To use the AWS CLI, pass the Encryption, ProviderType, CustomProviderClass, and CustomProviderLocation arguments to the emrfs option.

aws emr create-cluster --instance-type m5.xlarge --release-label emr-4.7.2 or earlier --emrfs Encryption=ClientSide,ProviderType=Custom,CustomProviderLocation=s3://mybucket/myfolder/provider.jar,CustomProviderClass=classname

Setting Encryption to ClientSide enables client-side encryption, CustomProviderClass is the name of your EncryptionMaterialsProvider object, and CustomProviderLocation is the local or Amazon S3 location from which Amazon EMR copies CustomProviderClass to each node in the cluster and places it in the classpath.

Specifying a custom materials provider using an SDK

To use an SDK, you can set the property fs.s3.cse.encryptionMaterialsProvider.uri to download the custom EncryptionMaterialsProvider class that you store in Amazon S3 to each node in your cluster. You configure this in emrfs-site.xml file along with CSE enabled and the proper location of the custom provider.

For example, in the AWS SDK for Java using RunJobFlowRequest, your code might look like the following:

<snip> Map<String,String> emrfsProperties = new HashMap<String,String>(); emrfsProperties.put("fs.s3.cse.encryptionMaterialsProvider.uri","s3://mybucket/MyCustomEncryptionMaterialsProvider.jar"); emrfsProperties.put("fs.s3.cse.enabled","true"); emrfsProperties.put("fs.s3.consistent","true"); emrfsProperties.put("fs.s3.cse.encryptionMaterialsProvider","full.class.name.of.EncryptionMaterialsProvider"); Configuration myEmrfsConfig = new Configuration() .withClassification("emrfs-site") .withProperties(emrfsProperties); RunJobFlowRequest request = new RunJobFlowRequest() .withName("Custom EncryptionMaterialsProvider") .withReleaseLabel("emr-7.2.0") .withApplications(myApp) .withConfigurations(myEmrfsConfig) .withServiceRole("EMR_DefaultRole_V2") .withJobFlowRole("EMR_EC2_DefaultRole") .withLogUri("s3://myLogUri/") .withInstances(new JobFlowInstancesConfig() .withEc2KeyName("myEc2Key") .withInstanceCount(2) .withKeepJobFlowAliveWhenNoSteps(true) .withMasterInstanceType("m5.xlarge") .withSlaveInstanceType("m5.xlarge") ); RunJobFlowResult result = emr.runJobFlow(request); </snip>

Custom EncryptionMaterialsProvider with arguments

You may need to pass arguments directly to the provider. To do this, you can use the emrfs-site configuration classification with custom arguments defined as properties. An example configuration is shown below, which is saved as a file, myConfig.json:

[ { "Classification": "emrfs-site", "Properties": { "myProvider.arg1":"value1", "myProvider.arg2":"value2" } } ]

Using the create-cluster command from the AWS CLI, you can use the --configurations option to specify the file as shown below:

aws emr create-cluster --release-label emr-7.2.0 --instance-type m5.xlarge --instance-count 2 --configurations file://myConfig.json --emrfs Encryption=ClientSide,CustomProviderLocation=s3://mybucket/myfolder/myprovider.jar,CustomProviderClass=classname

Configuring EMRFS S3EC V2 support

S3 Java SDK releases (1.11.837 and later) support encryption client Version 2 (S3EC V2) with various security enhancements. For more information, see the S3 blog post Updates to the Amazon S3 encryption client. Also, refer to Amazon S3 encryption client migration in the AWS SDK for Java Developer Guide.

Encryption client V1 is still available in the SDK for backward compatibility. By default EMRFS will use S3EC V1 to encrypt and decrypt S3 objects if CSE is enabled.

S3 objects encrypted with S3EC V2 cannot be decrypted by EMRFS on an EMR cluster whose release version is earlier than emr-5.31.0 (emr-5.30.1 and earlier, emr-6.1.0 and earlier).

Example Configure EMRFS to use S3EC V2

To configure EMRFS to use S3EC V2, add the following configuration:

{ "Classification": "emrfs-site", "Properties": { "fs.s3.cse.encryptionV2.enabled": "true" } }

emrfs-site.xml Properties for Amazon S3 client-side encryption

Property Default value Description
fs.s3.cse.enabled false

When set to true, EMRFS objects stored in Amazon S3 are encrypted using client-side encryption.

fs.s3.cse.encryptionV2.enabled false

When set to true, EMRFS uses S3 encryption client Version 2 to encrypt and decrypt objects on S3. Available for EMR version 5.31.0 and later.

fs.s3.cse.encryptionMaterialsProvider.uri N/A Applies when using custom encryption materials. The Amazon S3 URI where the JAR with the EncryptionMaterialsProvider is located. When you provide this URI, Amazon EMR automatically downloads the JAR to all nodes in the cluster.
fs.s3.cse.encryptionMaterialsProvider N/A

The EncryptionMaterialsProvider class path used with client-side encryption. When using CSE-KMS, specify com.amazon.ws.emr.hadoop.fs.cse.KMSEncryptionMaterialsProvider.

fs.s3.cse.materialsDescription.enabled false

When set to true, populates the materialsDescription of encrypted objects with the Amazon S3 URI for the object and the JobFlowId. Set to true when using custom encryption materials.

fs.s3.cse.kms.keyId N/A

Applies when using CSE-KMS. The value of the KeyId, ARN, or alias of the KMS key used for encryption.

fs.s3.cse.cryptoStorageMode ObjectMetadata

The Amazon S3 storage mode. By default, the description of the encryption information is stored in the object metadata. You can also store the description in an instruction file. Valid values are ObjectMetadata and InstructionFile. For more information, see Client-side data encryption with the AWS SDK for Java and Amazon S3.