Monitor Amazon EMR clusters for in-transit encryption at launch - AWS Prescriptive Guidance

Monitor Amazon EMR clusters for in-transit encryption at launch

Created by Susanne Kangnoh (AWS)

Environment: Production

Technologies: Analytics; Big data; CloudNative; Security, identity, compliance

Workload: Open-source

AWS services: Amazon EMR; Amazon SNS; AWS CloudTrail; Amazon CloudWatch

Summary

This pattern provides a security control that monitors Amazon EMR clusters at launch and sends an alert if in-transit encryption hasn't been enabled. 

Amazon EMR is a web service that makes it easy for you to run big data frameworks, such as Apache Hadoop, to process and analyze data. Amazon EMR enables you to process vast amounts of data in a cost-effective way by running mapping and reducing steps in parallel.

Data encryption prevents unauthorized users from accessing or reading data at rest or data in transit. Data at rest refers to data that is stored in media such as a local file system on each node, Hadoop Distributed File System (HDFS), or the EMR File System (EMRFS) through Amazon Simple Storage Service (Amazon S3). Data in transit refers to data that travels the network and is in flight between jobs. In-transit encryption supports open-source encryption features for Apache Spark, Apache TEZ, Apache Hadoop, Apache HBase, and Presto. You enable encryption by creating a security configuration from the AWS Command Line Interface (AWS CLI), the console, or AWS SDKs, and specifying the data encryption settings. You can provide the encryption artifacts for in-transit encryption in these two ways:

  • By uploading a compressed file of certificates to Amazon S3.

  • By referencing a custom Java class that provides encryption artifacts.

The security control that's included with this pattern monitors API calls and generates an Amazon CloudWatch Events event on the RunJobFlow action. The event calls an AWS Lambda function, which runs a Python script. The function gets the EMR cluster ID from the event JSON input, and performs the following checks to determine whether there's a security violation:

  • Checks if the EMR cluster has an Amazon EMR-specific security configuration.

  • If the cluster does have a security configuration, checks to see if encryption in transit is enabled.

  • If the cluster doesn't have a security configuration, sends an alert to an email address that you provide, by using Amazon Simple Notification Service (Amazon SNS). The notification specifies the EMR cluster name, violation details, AWS Region and account information, and the AWS Lambda ARN (Amazon Resource Name) that the notification is sourced from.

Prerequisites and limitations

Prerequisites

  • An active AWS account.

  • An S3 bucket to upload the Lambda code that's provided with this pattern.

  • An email address where you would like to receive violation notifications.

  • Amazon EMR logging enabled, for access to all the API logs.

Limitations

  • This detective control is regional and must be deployed in each AWS Region that you want to monitor.

Product versions

  • Amazon EMR release 4.8.0 or later.

Architecture

Workflow architecture

Architecture that monitors API calls and generates an event on the RunJobFlow action.

Automation and scale

  • If you are using AWS Organizations, you can use AWS Cloudformation StackSets to deploy the template in multiple accounts that you want to monitor.

Tools

AWS services

  • Amazon EMR – Amazon EMR is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data. By using these frameworks and related open-source projects, you can process data for analytics purposes and business intelligence workloads. Additionally, you can use Amazon EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon S3 and Amazon DynamoDB.

  • AWS Cloudformation – AWS CloudFormation helps you model and set up your AWS resources, provision them quickly and consistently, and manage them throughout their lifecycle. You can use a template to describe your resources and their dependencies, and launch and configure them together as a stack, instead of managing resources individually. You can manage and provision stacks across multiple AWS accounts and AWS Regions.

  • AWS Cloudwatch Events – Amazon CloudWatch Events delivers a near real-time stream of system events that describe changes in AWS resources. CloudWatch Events becomes aware of operational changes as they occur and takes corrective action as necessary, by sending messages to respond to the environment, activating functions, making changes, and capturing state information.

  • AWS Lambda – AWS Lambda is a compute service that supports running code without provisioning or managing servers. Lambda runs your code only when needed and scales automatically from a few requests per day to thousands per second. You pay only for the compute time that you consume—there is no charge when your code is not running.

  • AWS SNS – Amazon Simple Notification Service (Amazon SNS) coordinates and manages the sending of messages between publishers and clients, including web servers and email addresses. Subscribers receive all messages published to the topics to which they subscribe, and all subscribers to a topic receive the same messages.

Code

This pattern includes an attachment with two files:

  • EMRInTransitEncryption.zip is a compressed file that includes the security control (Lambda code).

  • EMRInTransitEncryption.yml is a CloudFormation template that deploys the security control.

See the Epics section for information about how to use these files.

Epics

TaskDescriptionSkills required

Upload the code to an S3 bucket.

Create a new S3 bucket or use an existing S3 bucket to upload the attached EMRInTransitEncryption.zip file (Lambda code). This bucket must be in the same AWS Region as the CloudFormation template and the resources that you want to evaluate.

Cloud architect

Deploy the CloudFormation template.

Open the Cloudformation console in the same AWS Region as the S3 bucket, and deploy the EMRInTransitEncryption.yml file that's provided in the attachment. In the next epic, provide values for the template parameters.

Cloud architect,
TaskDescriptionSkills required

Provide the S3 bucket name.

Enter the name of the S3 bucket that you created or selected in the first epic. This S3 bucket contains the .zip file for the Lambda code and must be in the same AWS Region as the CloudFormation template and the resource that will be evaluated.

Cloud architect

Provide the S3 key.

Specify the location of the Lambda code .zip file in your S3 bucket, without leading slashes (for example, EMRInTransitEncryption.zip or controls/EMRInTransitEncryption.zip).

Cloud architect

Provide an email address.

Specify an active email address where you want to receive violation notifications.  

Cloud architect

Specify a logging level.

Specify the logging level and verbosity for the Lambda logs. Info designates detailed informational messages on the application’s progress and should be used only for debugging. Error designates error events that could still allow the application to continue running. Warning designates potentially harmful situations.

Cloud architect
TaskDescriptionSkills required

Confirm the email subscription.

When the CloudFormation template deploys successfully, it sends a subscription email message to the email address you provided. To receive notifications, you must confirm this email subscription.                                                      

Cloud architect

Related resources

Attachments

To access additional content that is associated with this document, unzip the following file: attachment.zip