Monitor Amazon EMR clusters for in-transit encryption at launch
Created by Susanne Kangnoh (AWS)
Environment: Production | Technologies: Analytics; Big data; CloudNative; Security, identity, compliance | Workload: Open-source |
AWS services: Amazon EMR; Amazon SNS; AWS CloudTrail; Amazon CloudWatch |
Summary
This pattern provides a security control that monitors Amazon EMR clusters at launch and sends an alert if in-transit encryption hasn't been enabled.
Amazon EMR is a web service that makes it easy for you to run big data frameworks, such as Apache Hadoop, to process and analyze data. Amazon EMR enables you to process vast amounts of data in a cost-effective way by running mapping and reducing steps in parallel.
Data encryption prevents unauthorized users from accessing or reading data at rest or data in transit. Data at rest refers to data that is stored in media such as a local file system on each node, Hadoop Distributed File System (HDFS), or the EMR File System (EMRFS) through Amazon Simple Storage Service (Amazon S3). Data in transit refers to data that travels the network and is in flight between jobs. In-transit encryption supports open-source encryption features for Apache Spark, Apache TEZ, Apache Hadoop, Apache HBase, and Presto. You enable encryption by creating a security configuration from the AWS Command Line Interface (AWS CLI), the console, or AWS SDKs, and specifying the data encryption settings. You can provide the encryption artifacts for in-transit encryption in these two ways:
By uploading a compressed file of certificates to Amazon S3.
By referencing a custom Java class that provides encryption artifacts.
The security control that's included with this pattern monitors API calls and generates an Amazon CloudWatch Events event on the RunJobFlow action. The event calls an AWS Lambda function, which runs a Python script. The function gets the EMR cluster ID from the event JSON input, and performs the following checks to determine whether there's a security violation:
Checks if the EMR cluster has an Amazon EMR-specific security configuration.
If the cluster does have a security configuration, checks to see if encryption in transit is enabled.
If the cluster doesn't have a security configuration, sends an alert to an email address that you provide, by using Amazon Simple Notification Service (Amazon SNS). The notification specifies the EMR cluster name, violation details, AWS Region and account information, and the AWS Lambda ARN (Amazon Resource Name) that the notification is sourced from.
Prerequisites and limitations
Prerequisites
An active AWS account.
An S3 bucket to upload the Lambda code that's provided with this pattern.
An email address where you would like to receive violation notifications.
Amazon EMR logging enabled, for access to all the API logs.
Limitations
This detective control is regional and must be deployed in each AWS Region that you want to monitor.
Product versions
Amazon EMR release 4.8.0 or later.
Architecture
Workflow architecture
Automation and scale
If you are using AWS Organizations, you can use AWS Cloudformation StackSets to deploy the template in multiple accounts that you want to monitor.
Tools
AWS services
Amazon EMR – Amazon EMR is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop
and Apache Spark , on AWS to process and analyze vast amounts of data. By using these frameworks and related open-source projects, you can process data for analytics purposes and business intelligence workloads. Additionally, you can use Amazon EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon S3 and Amazon DynamoDB. AWS Cloudformation – AWS CloudFormation helps you model and set up your AWS resources, provision them quickly and consistently, and manage them throughout their lifecycle. You can use a template to describe your resources and their dependencies, and launch and configure them together as a stack, instead of managing resources individually. You can manage and provision stacks across multiple AWS accounts and AWS Regions.
AWS Cloudwatch Events – Amazon CloudWatch Events delivers a near real-time stream of system events that describe changes in AWS resources. CloudWatch Events becomes aware of operational changes as they occur and takes corrective action as necessary, by sending messages to respond to the environment, activating functions, making changes, and capturing state information.
AWS Lambda
– AWS Lambda is a compute service that supports running code without provisioning or managing servers. Lambda runs your code only when needed and scales automatically from a few requests per day to thousands per second. You pay only for the compute time that you consume—there is no charge when your code is not running. AWS SNS – Amazon Simple Notification Service (Amazon SNS) coordinates and manages the sending of messages between publishers and clients, including web servers and email addresses. Subscribers receive all messages published to the topics to which they subscribe, and all subscribers to a topic receive the same messages.
Code
This pattern includes an attachment with two files:
EMRInTransitEncryption.zip
is a compressed file that includes the security control (Lambda code).EMRInTransitEncryption.yml
is a CloudFormation template that deploys the security control.
See the Epics section for information about how to use these files.
Epics
Task | Description | Skills required |
---|---|---|
Upload the code to an S3 bucket. | Create a new S3 bucket or use an existing S3 bucket to upload the attached | Cloud architect |
Deploy the CloudFormation template. | Open the Cloudformation console in the same AWS Region as the S3 bucket, and deploy the | Cloud architect, |
Task | Description | Skills required |
---|---|---|
Provide the S3 bucket name. | Enter the name of the S3 bucket that you created or selected in the first epic. This S3 bucket contains the .zip file for the Lambda code and must be in the same AWS Region as the CloudFormation template and the resource that will be evaluated. | Cloud architect |
Provide the S3 key. | Specify the location of the Lambda code .zip file in your S3 bucket, without leading slashes (for example, | Cloud architect |
Provide an email address. | Specify an active email address where you want to receive violation notifications. | Cloud architect |
Specify a logging level. | Specify the logging level and verbosity for the Lambda logs. | Cloud architect |
Task | Description | Skills required |
---|---|---|
Confirm the email subscription. | When the CloudFormation template deploys successfully, it sends a subscription email message to the email address you provided. To receive notifications, you must confirm this email subscription. | Cloud architect |
Related resources
Creating a stack on the AWS CloudFormation console (AWS CloudFormation documentation)
Encryption options (Amazon EMR documentation)
Attachments
To access additional content that is associated with this document, unzip the following file: attachment.zip