Ensure encryption for Amazon EMR data at rest is enabled at launch - AWS Prescriptive Guidance

Ensure encryption for Amazon EMR data at rest is enabled at launch

Created by Priyanka Chaudhary (AWS)

Environment: Production

Technologies: Security, identity, compliance; Analytics

Workload: Open-source

AWS services: Amazon EMR; Amazon SNS; AWS KMS; AWS CloudFormation; AWS Lambda; Amazon S3

Summary

This pattern provides a security control for monitoring the encryption of Amazon EMR clusters on Amazon Web Services (AWS).

Data encryption helps prevent unauthorized users from reading data on a cluster and associated data storage systems. This includes data that may be intercepted as it travels the network, known as data in transit, and data that is saved to persistent media, known as data at-rest. Data at rest in Amazon Simple Storage Service (Amazon S3) can be encrypted in two ways.

  • Server-side encryption with Amazon S3–managed keys (SSE-S3)

  • Server-side encryption with AWS Key Management Service (AWS KMS) keys (SSE-KMS), set up with policies that are suitable for Amazon EMR.

This security control monitors for API calls and initiates an Amazon CloudWatch Events event on RunJobFlow. The trigger invokes AWS Lambda, which runs a Python script. The function retrieves the EMR cluster ID from the event JSON input and determines whether there is a security violation by performing the following checks.

  1. Check if an EMR cluster is associated with an Amazon EMR specific security configuration.

  2. If an Amazon EMR specific security configuration is associated with the EMR cluster, check if Encryption-at-Rest is turned on.

  3. If Encryption-at-Rest is not turned on, send an Amazon Simple Notification Service (Amazon SNS) notification that includes the EMR cluster name, violation details, AWS Region, AWS account, and the Lambda Amazon Resource Name (ARN) that this notification is sourced from.

Prerequisites and limitations

Prerequisites

  • An active AWS account

  • An S3 bucket for the Lambda code .zip file

  • An email address where you want to receive the violation notification

  • Amazon EMR logging turned off so that all the API logs can be retrieved

Limitations

  • This detective control is regional and must be deployed in the AWS Regions you intend to monitor.

Product versions

  • Amazon EMR release 4.8.0 and above

Architecture

Target technology stack

  • Amazon EMR

  • Amazon CloudWatch Events event

  • Lambda function

  • Amazon SNS

Target architecture

A security control that monitors the encryption of Amazon EMR clusters.

Automation and scale

If you are using AWS Organizations, you can use AWS Cloudformation StackSets to deploy this template in multiple accounts that you want to monitor.

Tools

Tools

  • AWS CloudFormation is a service that helps you model and set up AWS resources using infrastructure as code.

  • Amazon CloudWatch Events delivers a near real-time stream of system events that describe changes in AWS resources.

  • Amazon EMR is a managed cluster platform that simplifies running big data frameworks.

  • AWS Lambda supports running code without provisioning or managing servers.

  • Amazon S3 is a highly scalable object storage service that can be used for a wide range of storage solutions, including websites, mobile applications, backups, and data lakes.

  • Amazon SNS coordinates and manages the delivery or sending of messages between publishers and clients, including web servers and email addresses. Subscribers receive all messages published to the topics to which they subscribe, and all subscribers to a topic receive the same messages.

Code 

  • The EMREncryptionAtRest.zip and EMREncryptionAtRest.yml files for this project available as an attachment.

Epics

TaskDescriptionSkills required

Define the S3 bucket.

On the Amazon S3 console, choose or create an S3 bucket with a unique name that does not contain leading slashes. An S3 bucket name is globally unique, and the namespace is shared by all AWS accounts. Your S3 bucket needs to be in the same Region as the Amazon EMR cluster that is being evaluated.

Cloud Architect
TaskDescriptionSkills required

Upload the Lambda code to the S3 bucket.

Upload the Lambda code .zip file that's provided in the "Attachments" section to the defined S3 bucket.

Cloud Architect
TaskDescriptionSkills required

Deploy the AWS CloudFormation template.

On the AWS CloudFormation console, in the same Region as your S3 bucket, deploy the AWS CloudFormation template that's provided as an attachment to this pattern. In the next epic, provide the values for the parameters. For more information about deploying AWS CloudFormation templates, see the “Related resources” section.

Cloud Architect
TaskDescriptionSkills required

Name the S3 bucket.

Enter the name of the S3 bucket that you created in the first epic.

Cloud Architect

Provide the Amazon S3 key.

Provide the location of the Lambda code .zip file in your S3 bucket, without leading slashes (for example, <directory>/<file-name>.zip).

Cloud Architect

Provide an email address.

Provide an active email address to receive Amazon SNS notifications.

Cloud Architect

Define the logging level.

Define the logging level and frequency for your Lambda function. “Info” designates detailed informational messages on the application’s progress. “Error” designates error events that could still allow the application to continue running. “Warning” designates potentially harmful situations.

Cloud Architect
TaskDescriptionSkills required

Confirm the subscription.

When the template successfully deploys, it sends a subscription email message to the email address provided. You must confirm this email subscription to receive violation notifications.

Cloud Architect

Related resources

Attachments

To access additional content that is associated with this document, unzip the following file: attachment.zip