Encryption options
With Amazon EMR versions 4.8.0 and later, you can use a security configuration to specify settings for encrypting data at rest, data in transit, or both. When you enable at-rest data encryption, you can choose to encrypt EMRFS data in Amazon S3, data in local disks, or both. Each security configuration that you create is stored in Amazon EMR rather than in the cluster configuration, so you can easily reuse a configuration to specify data encryption settings whenever you create a cluster. For more information, see Create a security configuration.
The following diagram shows the different data encryption options available with security configurations.

The following encryption options are also available and are not configured using a security configuration:
-
Optionally, with Amazon EMR versions 4.1.0 and later, you can choose to configure transparent encryption in HDFS. For more information, see Transparent encryption in HDFS on Amazon EMR in the Amazon EMR Release Guide.
-
If you are using a release version of Amazon EMR that does not support security configurations, you can configure encryption for EMRFS data in Amazon S3 manually. For more information, see Specifying Amazon S3 encryption using EMRFS properties.
-
If you are using an Amazon EMR version earlier than 5.24.0, an encrypted EBS root device volume is supported only when using a custom AMI. For more information, see Creating a custom AMI with an encrypted Amazon EBS root device volume in the Amazon EMR Management Guide.
Note
Beginning with Amazon EMR version 5.24.0, you can use a security configuration option to encrypt EBS root device and storage volumes when you specify AWS KMS as your key provider. For more information, see Local disk encryption.
Data encryption requires keys and certificates. A security configuration gives you
the flexibility to choose from several options, including keys managed by AWS Key Management Service,
keys managed by Amazon S3, and keys and certificates from custom providers that you
supply. When using AWS KMS as your key provider, charges apply for the storage and use
of encryption keys. For more information, see AWS KMS pricing
Before you specify encryption options, decide on the key and certificate management systems you want to use, so you can first create the keys and certificates or the custom providers that you specify as part of encryption settings.
Encryption at rest for EMRFS data in Amazon S3
Amazon S3 encryption works with EMR File System (EMRFS) objects read from and written to Amazon S3. You specify Amazon S3 server-side encryption (SSE) or client-side encryption (CSE) as the Default encryption mode when you enable encryption at rest. Optionally, you can specify different encryption methods for individual buckets using Per bucket encryption overrides. Regardless of whether Amazon S3 encryption is enabled, Transport Layer Security (TLS) encrypts the EMRFS objects in transit between EMR cluster nodes and Amazon S3. For in-depth information about Amazon S3 encryption, see Protecting data using encryption in the Amazon Simple Storage Service User Guide.
Note
When you use AWS KMS, charges apply for the storage and use of encryption keys. For
more information, see AWS KMS Pricing
Amazon S3 server-side encryption
When you set up Amazon S3 server-side encryption, Amazon S3 encrypts data at the object level as it writes the data to disk and decrypts the data when it is accessed. For more information about SSE, see Protecting data using server-side encryption in the Amazon Simple Storage Service User Guide.
You can choose between two different key management systems when you specify SSE in Amazon EMR:
-
SSE-S3 – Amazon S3 manages keys for you.
-
SSE-KMS – You use an AWS KMS key to set up with policies suitable for Amazon EMR. For more information about key requirements for Amazon EMR, see Using AWS KMS keys for encryption.
SSE with customer-provided keys (SSE-C) is not available for use with Amazon EMR.
Amazon S3 client-side encryption
With Amazon S3 client-side encryption, the Amazon S3 encryption and decryption takes place in the EMRFS client on your cluster. Objects are encrypted before being uploaded to Amazon S3 and decrypted after they are downloaded. The provider you specify supplies the encryption key that the client uses. The client can use keys provided by AWS KMS (CSE-KMS) or a custom Java class that provides the client-side root key (CSE-C). The encryption specifics are slightly different between CSE-KMS and CSE-C, depending on the specified provider and the metadata of the object being decrypted or encrypted. For more information about these differences, see Protecting data using client-side encryption in the Amazon Simple Storage Service User Guide.
Note
Amazon S3 CSE only ensures that EMRFS data exchanged with Amazon S3 is encrypted; not all data on cluster instance volumes is encrypted. Furthermore, because Hue does not use EMRFS, objects that the Hue S3 File Browser writes to Amazon S3 are not encrypted.
Local disk encryption
The following mechanisms work together to encrypt local disks when you enable local disk encryption using an Amazon EMR security configuration.
Open-source HDFS encryption
HDFS exchanges data between cluster instances during distributed processing. It also reads from and writes data to instance store volumes and the EBS volumes attached to instances. The following open-source Hadoop encryption options are activated when you enable local disk encryption:
-
Secure Hadoop RPC
is set to Privacy
, which uses Simple Authentication Security Layer (SASL). -
Data encryption on HDFS block data transfer
is set to true
and is configured to use AES 256 encryption.
Note
You can activate additional Apache Hadoop encryption by enabling in-transit encryption. For more information, see Encryption in transit. These encryption settings do not activate HDFS transparent encryption, which you can configure manually. For more information, see Transparent encryption in HDFS on Amazon EMR in the Amazon EMR Release Guide.
Instance store encryption
For EC2 instance types that use NVMe-based SSDs as the instance store volume, NVMe encryption is used regardless of Amazon EMR encryption settings. For more information, see NVMe SSD volumes in the Amazon EC2 User Guide for Linux Instances. For other instance store volumes, Amazon EMR uses LUKS to encrypt the instance store volume when local disk encryption is enabled regardless of whether EBS volumes are encrypted using EBS encryption or LUKS.
EBS volume encryption
If you create a cluster in a Region where Amazon EC2 encryption of EBS volumes is enabled by default for your account, EBS volumes are encrypted even if local disk encryption is not enabled. For more information, see Encryption by default in the Amazon EC2 User Guide for Linux Instances. With local disk encryption enabled in a security configuration, the Amazon EMR settings take precedence over the Amazon EC2 encryption-by-default settings for cluster EC2 instances.
The following options are available to encrypt EBS volumes using a security configuration:
-
EBS encryption – Beginning with Amazon EMR version 5.24.0, you can choose to enable EBS encryption. The EBS encryption option encrypts the EBS root device volume and attached storage volumes. The EBS encryption option is available only when you specify AWS Key Management Service as your key provider. We recommend using EBS encryption.
-
LUKS encryption – If you choose to use LUKS encryption for Amazon EBS volumes, the LUKS encryption applies only to attached storage volumes, not to the root device volume. For more information about LUKS encryption, see the LUKS on-disk specification
. For your key provider, you can set up an AWS KMS key with policies suitable for Amazon EMR, or a custom Java class that provides the encryption artifacts. When you use AWS KMS, charges apply for the storage and use of encryption keys. For more information, see AWS KMS pricing
.
Note
To check if EBS encryption is enabled on your cluster, it is
recommended that you use DescribeVolumes
API call. For more
information, see DescribeVolumes. Running lsblk
on the cluster
will only check the status of LUKS encryption, instead of EBS
encryption.
Encryption in transit
Several encryption mechanisms are enabled with in-transit encryption. These are open-source features, are application-specific, and may vary by Amazon EMR release. The following application-specific encryption features can be enabled using Apache application configurations. For more information, see Configure applications.
You specify the encryption artifacts used for in-transit encryption in one of two ways: either by providing a zipped file of certificates that you upload to Amazon S3, or by referencing a custom Java class that provides encryption artifacts. For more information, see Providing certificates for encrypting data in transit with Amazon EMR encryption.