Amazon S3 client-side encryption
With Amazon S3 client-side encryption, the Amazon S3 encryption and decryption takes place in the EMRFS client on your cluster. Objects are encrypted before being uploaded to Amazon S3 and decrypted after they are downloaded. The provider you specify supplies the encryption key that the client uses. The client can use keys provided by AWS KMS (CSE-KMS) or a custom Java class that provides the client-side root key (CSE-C). The encryption specifics are slightly different between CSE-KMS and CSE-C, depending on the specified provider and the metadata of the object being decrypted or encrypted. For more information about these differences, see Protecting data using client-side encryption in the Amazon Simple Storage Service User Guide.
Note
Amazon S3 CSE only ensures that EMRFS data exchanged with Amazon S3 is encrypted; not all data on cluster instance volumes is encrypted. Furthermore, because Hue does not use EMRFS, objects that the Hue S3 File Browser writes to Amazon S3 are not encrypted.
To specify CSE-KMS for EMRFS data in Amazon S3 using the AWS CLI
-
Type the following command and replace
MyKMSKeyID
with the Key ID or ARN of the KMS key to use:aws emr create-cluster --release-label
emr-4.7.2 or earlier
--emrfs Encryption=ClientSide,ProviderType=KMS,KMSKeyId=MyKMSKeyId
Creating a custom key provider
Depending on the type of encryption you use when creating a custom key provider, the application must also implement different EncryptionMaterialsProvider interfaces. Both interfaces are available in the AWS SDK for Java version 1.11.0 and later.
-
To implement Amazon S3 encryption, use the com.amazonaws.services.s3.model.EncryptionMaterialsProvider interface.
-
To implement local disk encryption, use the com.amazonaws.services.elasticmapreduce.spi.security.EncryptionMaterialsProvider interface.
You can use any strategy to provide encryption materials for the implementation. For example, you might choose to provide static encryption materials or integrate with a more complex key management system.
If you’re using Amazon S3 encryption, you must use the encryption algorithms AES/GCM/NoPadding for custom encryption materials.
If you’re using local disk encryption, the encryption algorithm to use for custom encryption materials varies by EMR release. For Amazon EMR 7.0.0 and lower, you must use AES/GCM/NoPadding. For Amazon EMR 7.1.0 and higher, you must use AES.
The EncryptionMaterialsProvider class gets encryption materials by encryption context. Amazon EMR populates encryption context information at runtime to help the caller determine the correct encryption materials to return.
Example: Using a custom key provider for Amazon S3 encryption with EMRFS
When Amazon EMR fetches the encryption materials from the EncryptionMaterialsProvider class to perform encryption, EMRFS optionally populates the materialsDescription argument with two fields: the Amazon S3 URI for the object and the JobFlowId of the cluster, which can be used by the EncryptionMaterialsProvider class to return encryption materials selectively.
For example, the provider may return different keys for different Amazon S3 URI prefixes. It is the description of the returned encryption materials that is eventually stored with the Amazon S3 object rather than the materialsDescription value that is generated by EMRFS and passed to the provider. While decrypting an Amazon S3 object, the encryption materials description is passed to the EncryptionMaterialsProvider class, so that it can, again, selectively return the matching key to decrypt the object.
An EncryptionMaterialsProvider reference implementation is provided below.
Another custom provider, EMRFSRSAEncryptionMaterialsProvider
import com.amazonaws.services.s3.model.EncryptionMaterials; import com.amazonaws.services.s3.model.EncryptionMaterialsProvider; import com.amazonaws.services.s3.model.KMSEncryptionMaterials; import org.apache.hadoop.conf.Configurable; import org.apache.hadoop.conf.Configuration; import java.util.Map; /** * Provides KMSEncryptionMaterials according to Configuration */ public class MyEncryptionMaterialsProviders implements EncryptionMaterialsProvider, Configurable{ private Configuration conf; private String kmsKeyId; private EncryptionMaterials encryptionMaterials; private void init() { this.kmsKeyId = conf.get("my.kms.key.id"); this.encryptionMaterials = new KMSEncryptionMaterials(kmsKeyId); } @Override public void setConf(Configuration conf) { this.conf = conf; init(); } @Override public Configuration getConf() { return this.conf; } @Override public void refresh() { } @Override public EncryptionMaterials getEncryptionMaterials(Map<String, String> materialsDescription) { return this.encryptionMaterials; } @Override public EncryptionMaterials getEncryptionMaterials() { return this.encryptionMaterials; } }
Specifying a custom materials provider using the AWS CLI
To use the AWS CLI, pass the Encryption
,
ProviderType
, CustomProviderClass
, and
CustomProviderLocation
arguments to the
emrfs
option.
aws emr create-cluster --instance-type m5.xlarge --release-label
emr-4.7.2 or earlier
--emrfs Encryption=ClientSide,ProviderType=Custom,CustomProviderLocation=s3://mybucket/myfolder/provider.jar,CustomProviderClass=classname
Setting Encryption
to ClientSide
enables
client-side encryption, CustomProviderClass
is the name of
your EncryptionMaterialsProvider
object, and
CustomProviderLocation
is the local or Amazon S3 location
from which Amazon EMR copies CustomProviderClass
to each node in
the cluster and places it in the classpath.
Specifying a custom materials provider using an SDK
To use an SDK, you can set the property
fs.s3.cse.encryptionMaterialsProvider.uri
to download
the custom EncryptionMaterialsProvider
class that you store in Amazon S3 to each
node in your cluster. You configure this in
emrfs-site.xml
file along with CSE enabled and the
proper location of the custom provider.
For example, in the AWS SDK for Java using RunJobFlowRequest, your code might look like the following:
<snip> Map<String,String> emrfsProperties = new HashMap<String,String>(); emrfsProperties.put("fs.s3.cse.encryptionMaterialsProvider.uri","s3://mybucket/MyCustomEncryptionMaterialsProvider.jar"); emrfsProperties.put("fs.s3.cse.enabled","true"); emrfsProperties.put("fs.s3.consistent","true"); emrfsProperties.put("fs.s3.cse.encryptionMaterialsProvider","full.class.name.of.EncryptionMaterialsProvider"); Configuration myEmrfsConfig = new Configuration() .withClassification("emrfs-site") .withProperties(emrfsProperties); RunJobFlowRequest request = new RunJobFlowRequest() .withName("Custom EncryptionMaterialsProvider") .withReleaseLabel("
emr-7.2.0
") .withApplications(myApp) .withConfigurations(myEmrfsConfig) .withServiceRole("EMR_DefaultRole_V2") .withJobFlowRole("EMR_EC2_DefaultRole") .withLogUri("s3://myLogUri
/") .withInstances(new JobFlowInstancesConfig() .withEc2KeyName("myEc2Key
") .withInstanceCount(2) .withKeepJobFlowAliveWhenNoSteps(true) .withMasterInstanceType("m5.xlarge") .withSlaveInstanceType("m5.xlarge") ); RunJobFlowResult result = emr.runJobFlow(request); </snip>
Custom EncryptionMaterialsProvider with arguments
You may need to pass arguments directly to the provider. To do this, you can use the
emrfs-site
configuration classification with custom arguments defined as properties. An example configuration is shown below, which is saved as a file, myConfig.json
:
[ { "Classification": "emrfs-site", "Properties": { "myProvider.arg1":"value1", "myProvider.arg2":"value2" } } ]
Using the create-cluster
command from the AWS CLI, you can use the --configurations
option to specify the file as shown below:
aws emr create-cluster --release-label
--instance-type
emr-7.2.0
m5.xlarge
--instance-count2
--configurations file://myConfig.json --emrfs Encryption=ClientSide,CustomProviderLocation=s3://mybucket/myfolder/myprovider.jar
,CustomProviderClass=classname
Configuring EMRFS S3EC V2 support
S3 Java SDK releases (1.11.837 and later) support encryption client Version 2 (S3EC V2) with various security enhancements.
For more information, see the S3 blog post Updates to the Amazon S3 encryption client
Encryption client V1 is still available in the SDK for backward compatibility. By default EMRFS will use S3EC V1 to encrypt and decrypt S3 objects if CSE is enabled.
S3 objects encrypted with S3EC V2 cannot be decrypted by EMRFS on an EMR cluster whose release version is earlier than emr-5.31.0 (emr-5.30.1 and earlier, emr-6.1.0 and earlier).
Example Configure EMRFS to use S3EC V2
To configure EMRFS to use S3EC V2, add the following configuration:
{ "Classification": "emrfs-site", "Properties": { "fs.s3.cse.encryptionV2.enabled": "true" } }
emrfs-site.xml
Properties for
Amazon S3 client-side encryption
Property | Default value | Description |
---|---|---|
fs.s3.cse.enabled |
false |
When set to |
fs.s3.cse.encryptionV2.enabled |
false |
When set to |
fs.s3.cse.encryptionMaterialsProvider.uri |
N/A |
Applies when using custom encryption materials. The Amazon S3 URI where the JAR with the
EncryptionMaterialsProvider is located. When you provide
this URI, Amazon EMR automatically downloads the JAR to all nodes
in the cluster. |
fs.s3.cse.encryptionMaterialsProvider |
N/A |
The |
fs.s3.cse.materialsDescription.enabled |
false |
When set to |
fs.s3.cse.kms.keyId |
N/A |
Applies when using CSE-KMS. The value of the KeyId, ARN, or alias of the KMS key used for encryption. |
fs.s3.cse.cryptoStorageMode |
ObjectMetadata |
The Amazon S3 storage mode. By default, the description of
the encryption information is stored in the object
metadata. You can also store the description in an
instruction file. Valid values are ObjectMetadata and
InstructionFile. For more information, see Client-side data encryption with the AWS SDK for Java and Amazon S3 |