Enforce tagging of Amazon EMR clusters at launch - AWS Prescriptive Guidance

Enforce tagging of Amazon EMR clusters at launch

Created by Priyanka Chaudhary (AWS)

Environment: Production

Technologies: Analytics; Security, identity, compliance

AWS services: Amazon EMR; AWS Lambda; Amazon CloudWatch Events

Summary

This pattern provides a security control that ensures that Amazon EMR clusters are tagged when they are created. 

Amazon EMR is an Amazon Web Services (AWS) service for processing and analyzing vast amounts of data. Amazon EMR offers an expandable, low-configuration service as an easier alternative to running in-house cluster computing. You can use tagging to categorize AWS resources in different ways, such as by purpose, owner, or environment . For example, you can tag your Amazon EMR clusters by assigning custom metadata to each cluster. A tag consists of a key and value that you define. We recommend that you create a consistent set of tags to meet your organization's requirements. When you add a tag to an Amazon EMR cluster, the tag is also propagated to each active Amazon Elastic Compute Cloud (Amazon EC2) instance that is associated with the cluster. Similarly, when you remove a tag from an Amazon EMR cluster, that tag is removed from each associated, active EC2 instance as well.

The detective control monitors API calls and initiates an Amazon CloudWatch Events event for the RunJobFlow, AddTags, RemoveTags, and CreateTags APIs. The event calls AWS Lambda, which runs a Python script. The Python function gets the Amazon EMR cluster ID from the JSON input from the event and performs the following checks:

  • Check if the Amazon EMR cluster is configured with tag names that you specify.

  • If not, send an Amazon Simple Notification Service (Amazon SNS) notification to the user with the relevant information: the Amazon EMR cluster name, violation details, AWS Region, AWS account, and Amazon Resource Name (ARN) for Lambda that this notification is sourced from.

Prerequisites and limitations

Prerequisites 

  • An active AWS account

  • An Amazon Simple Storage Service (Amazon S3) bucket to upload the provided Lambda code. Or, you can create an S3 bucket for this purpose, as described in the Epics section.

  • An active email address where you would like to receive violation notifications.

  • A list of mandatory tags you want to check for.

Limitations 

  • This security control is regional. You must deploy it in each AWS Region that you want to monitor.

Product versions

  • Amazon EMR release 4.8.0 and later.

Architecture

Workflow architecture 

Cluster launch, monitoring using APIs, event generation, Lambda function call, notification sent.

Automation and scale

Tools

AWS services

  • AWS CloudFormation –  AWS CloudFormation helps you model and set up your AWS resources, provision them quickly and consistently, and manage them throughout their lifecycle. You can use a template to describe your resources and their dependencies, and launch and configure them together as a stack, instead of managing resources individually. You can manage and provision stacks across multiple AWS accounts and AWS Regions.

  • Amazon CloudWatch Events - Amazon CloudWatch Events delivers a near real-time stream of system events that describe changes in AWS resources.

  • Amazon EMR - Amazon EMR is web service that simplifies running big data frameworks and processing vast amounts of data efficiently.

  • AWS Lambda – AWS Lambda is a compute service that supports running code without provisioning or managing servers. Lambda runs your code only when needed and scales automatically, from a few requests per day to thousands per second. 

  • Amazon S3 – Amazon Simple Storage Service (Amazon S3) is an object storage service. You can use Amazon S3 to store and retrieve any amount of data at any time, from anywhere on the web.

  • Amazon SNS – Amazon Simple Notification Service (Amazon SNS) coordinates and manages the delivery or sending of messages between publishers and clients, including web servers and email addresses. Subscribers receive all messages published to the topics to which they subscribe, and all subscribers to a topic receive the same messages.

Code

This pattern includes the following attachments:

  • EMRTagValidation.zip – The Lambda code for the security control.

  • EMRTagValidation.yml – The CloudFormation template that sets up the event and Lambda function.

Epics

TaskDescriptionSkills required

Define the S3 bucket.

On the Amazon S3 console, choose or create an S3 bucket to host the Lambda code .zip file. This S3 bucket must be in the same AWS Region as the Amazon EMR cluster you want to monitor. An Amazon S3 bucket name is globally unique, and the namespace is shared by all AWS accounts. The S3 bucket name cannot include leading slashes.

Cloud architect

Upload the Lambda code.

Upload the Lambda code .zip file provided in the Attachments section to the S3 bucket.                                              

Cloud architect
TaskDescriptionSkills required

Launch the AWS CloudFormation template.

Open the AWS CloudFormation console in the same AWS Region as your S3 bucket and deploy the template. For more information about deploying AWS CloudFormation templates, see Creating a stack on the AWS CloudFormation console in the CloudFormation documentation.

Cloud architect

Complete the parameters in the template.

When you launch the template, you'll be prompted for the following information:

  • S3 bucket: Specify the bucket that you created or selected in the first epic. This is where you uploaded the attached Lambda code (.zip file).

  • S3 key: Specify the location of the Lambda .zip file in your S3 bucket (for example, filename.zip or controls/filename.zip). Do not include leading slashes.

  • Notification email: Provide an active email address where you want to receive Amazon SNS notifications.  

  • Tagging key names: Provide the tags you want to check for, in a comma-separated list (for example, ApplicationID, Environment, Owner). The CloudWatch Events event monitors the cluster for these tags and sends a notification if they aren't found.

  • Lambda logging level: Specify the logging level and frequency for the Lambda function. Use Info to log detailed informational messages on progress, Error for error events that would still allow the deployment to continue, and Warning for potentially harmful situations.                                        

Cloud architect
TaskDescriptionSkills required

Confirm the subscription.

When the CloudFormation template deploys successfully, it sends a subscription email to the email address you provided. You must confirm this email subscription to start receiving violation notifications.

Cloud architect

Related resources

Attachments

To access additional content that is associated with this document, unzip the following file: attachment.zip