Menu
Amazon EMR
Amazon EMR Release Guide

Understanding Encryption Options with Amazon EMR

Amazon EMR enables you to use a security configuration to specify settings for Amazon S3 encryption with EMR File System (EMRFS), local disk encryption, and in-transit encryption. You create a security configuration that specifies encryption settings and then use the security configuration when you create a cluster.

The following diagram shows the different data encryption options available with security configurations.

You can use a security configuration to encrypt data at-rest, data in-transit, or both. Each security configuration is stored in Amazon EMR rather than in the cluster configuration, so you can easily reuse a configuration to specify data encryption settings whenever a cluster is created.

Data encryption requires keys and certificates. A security configuration gives you the flexibility to choose from several options, including keys managed by AWS Key Management Service, keys managed by Amazon S3, and keys and certificates from custom providers that you supply.

When using AWS KMS as your key provider, charges apply for the storage and use of encryption keys. For more information, see AWS KMS Pricing.

You can use the Amazon EMR console, the AWS Command Line Interface (AWS CLI), or the AWS SDKs to create security configurations and to enable encryption options when a cluster is created. Before you specify encryption options, decide on the key and certificate management systems you want to use, so you can first create the keys and certificates or the custom providers that you specify as part of encryption settings.

Amazon S3 encryption and local disk encryption options are specified together when you configure at-rest encryption. You can choose to enable only at-rest encryption, only in-transit encryption, or both.

At-Rest Encryption for Amazon S3 with EMRFS

Amazon S3 encryption works with EMR File System (EMRFS) objects read from and written to Amazon S3. You specify Amazon S3 server-side encryption (SSE) or client-side encryption (CSE) when you enable at-rest encryption. Amazon S3 SSE and CSE encryption with EMRFS are mutually exclusive; you can choose either but not both. Regardless of whether Amazon S3 encryption is enabled, Transport Layer Security (TLS) encrypts the EMRFS objects in-transit between Amazon EMR cluster nodes and Amazon S3. For in-depth information about Amazon S3 encryption, see Protecting Data Using Encryption in the Amazon Simple Storage Service Developer Guide.

Amazon S3 Server-Side Encryption

When you set up Amazon S3 SSE, Amazon S3 encrypts data at the object level as it writes the data to disk and decrypts the data when it is accessed. For more information about SSE, see Protecting Data Using Server-Side Encryption in the Amazon Simple Storage Service Developer Guide.

You can choose between two different key management systems when you specify SSE in Amazon EMR:

  • SSE-S3: Amazon S3 manages keys for you.

  • SSE-KMS: You use an AWS KMS customer master key (CMK) set up with policies suitable for Amazon EMR. When you use AWS KMS, charges apply for the storage and use of encryption keys. For more information, see AWS KMS Pricing.

SSE with customer-provided keys (SSE-C) is not available for use with Amazon EMR.

Amazon S3 Client-Side Encryption

With Amazon S3 CSE, the Amazon S3 encryption and decryption takes place in the EMRFS client on your cluster. Objects are encrypted before being uploaded to Amazon S3 and decrypted after they are downloaded. The provider you specify supplies the encryption key that the client uses. The client can use keys provided by AWS KMS (CSE-KMS) or a custom Java class that provides the client-side master key (CSE-C). The encryption specifics are slightly different between CSE-KMS and CSE-C, depending on the specified provider and the metadata of the object being decrypted or encrypted. For more information about these differences, see Protecting Data Using Client-Side Encryption in the Amazon Simple Storage Service Developer Guide.

Note

Amazon S3 CSE only ensures that EMRFS data exchanged with Amazon S3 is encrypted; not all data on cluster instance volumes is encrypted. Furthermore, because Hue does not use EMRFS, objects that the Hue S3 File Browser writes to Amazon S3 are not encrypted. These are important considerations if you use an Amazon EMR version earlier than 4.8.0. In later versions, Amazon S3 encryption is enabled as part of at-rest encryption, which includes local disk encryption. For more information, see Local Disk Encryption below.

At-rest Encryption for Local Disks

Two mechanisms work together to encrypt cluster instance volumes when you enable at-rest data encryption:

  • Open-source HDFS Encryption: HDFS exchanges data between cluster instances during distributed processing, and also reads from and writes data to instance store volumes and the Elastic Block Store (EBS) volumes attached to instances. The following open-source Hadoop encryption options are activated when you enable local-disk encryption:

    Note

    You can activate additional Apache Hadoop encryption by enabling in-transit encryption (see In-Transit Data Encryption). These encryption settings do not activate HDFS transparent encryption, which you can configure manually. For more information, see Transparent Encryption in HDFS on Amazon EMR.

  • LUKS. In addition to HDFS encryption, the Amazon EC2 instance store volumes (except boot volumes) and the attached Amazon EBS volumes of cluster instances are encrypted using LUKS. For more information about LUKS encryption, see the LUKS on-disk specification..

    For your key provider, you can use an AWS KMS CMK set up with policies suitable for Amazon EMR, or a custom Java class that provides the encryption artifacts. When you use AWS KMS, charges apply for the storage and use of encryption keys. For more information, see AWS KMS Pricing.

In-Transit Data Encryption

Several encryption mechanisms are enabled with in-transit encryption. These are open-source features, are application-specific, and may vary by Amazon EMR release. In this release, the following application-specific encryption features can be enabled using security configurations:

You specify the encryption artifacts used for in-transit encryption in one of two ways: either by providing a zipped file of certificates that you upload to Amazon S3, or by referencing a custom Java class that provides encryption artifacts. For more information, see Providing Certificates for In-Transit Data Encryption with Amazon EMR Encryption.