Ensure Amazon EMR logging to Amazon S3 is enabled at launch - AWS Prescriptive Guidance

Ensure Amazon EMR logging to Amazon S3 is enabled at launch

Created by Priyanka Chaudhary (AWS)

Environment: Production

Technologies: Security, identity, compliance; Serverless; Analytics

Workload: Open-source

AWS services: Amazon EMR; Amazon S3; Amazon SNS; Amazon CloudWatch

Summary

This pattern provides a security control that monitors logging configuration for Amazon EMR clusters running on Amazon Web Services (AWS).

Amazon EMR is an AWS tool for big data processing and analysis. Amazon EMR offers the expandable low-configuration service as an alternative to running in-house cluster computing. Amazon EMR provides two types of EMR clusters.

  • Transient Amazon EMR clusters: Transient Amazon EMR clusters automatically shut down and stop incurring costs when processing is finished.

  • Persistent Amazon EMR clusters: Persistent Amazon EMR clusters continue to run after the data processing job is complete.

Amazon EMR and Hadoop both produce log files that report status on the cluster. By default, these are written to the master node in the /mnt/var/log/ directory. Depending on how you configure the cluster when you launch it, you can also save these logs to Amazon Simple Storage Service (Amazon S3) and view them through the graphical debugging tool. Note that Amazon S3 logging can be specified only when the cluster is launched. With this configuration, logs are sent from the primary node to the Amazon S3 location every 5 minutes. For transient clusters, Amazon S3 logging is important because the clusters disappear when processing is complete, and these log files can be use to debug any failed jobs.

The pattern uses an AWS CloudFormation template to deploy a security control that monitors for API calls and starts Amazon CloudWatch Events on "RunJobFlow." The trigger invokes AWS Lambda, which runs a Python script. The Lambda function retrieves the EMR cluster ID from the event JSON input and also checks for an Amazon S3 log URI. If an Amazon S3 URI is not found, the Lambda function sends an Amazon Simple Notification Service (Amazon SNS) notification detailing the EMR cluster name, violation details, AWS Region, AWS account, and the Lambda Amazon Resource Name (ARN) that the notification is sourced from.

Prerequisites and limitations

Prerequisites

  • An active AWS account

  • An S3 bucket for the Lambda code .zip file

  • An email address where you want to receive the violation notification

Limitations

  • This detective control is regional and must be deployed in the AWS Regions you intend to monitor.

Product versions

  • Amazon EMR release 4.8.0 and later

Architecture

Target technology stack

  • Amazon CloudWatch Events event

  • Amazon EMR

  • Lambda function

  • S3 bucket

  • Amazon SNS

Target architecture

Automation and scale

  • If you are using AWS Organizations, you can use AWS CloudFormation StackSets to deploy this template in multiple accounts that you want to monitor.

Tools

Tools

  • AWS CloudFormation – AWS CloudFormation helps you model and set up AWS resources using infrastructure as code.

  • AWS Cloudwatch Events – AWS CloudWatch Events delivers a near real-time stream of system events that describe changes in AWS resources.

  • Amazon EMR – Amazon EMR is a managed cluster platform that simplifies running big data frameworks.

  • AWS Lambda – AWS Lambda supports running code without provisioning or managing servers. Lambda runs your code only when needed and scales automatically, from a few requests per day to thousands per second.

  • Amazon S3 – Amazon S3 is a web services interface that you can use to store and retrieve any amount of data from anywhere on the web.

  • Amazon SNS – Amazon SNS is a web service that coordinates and manages the delivery or sending of messages between publishers and clients, including web servers and email addresses.

Code

  • A .zip file of the project is available as an attachment.

Epics

TaskDescriptionSkills required
Define the S3 bucket.

To host the Lambda code .zip file, choose or create an S3 bucket with a unique name that does not contain leading slashes. An S3 bucket name is globally unique, and the namespace is shared by all AWS accounts. Your S3 bucket needs to be in the same AWS Region as the Amazon EMR cluster that is being evaluated.

Cloud Architect
TaskDescriptionSkills required
Upload the Lambda code to the S3 bucket.

Upload the Lambda code .zip file that's provided in the "Attachments" section to the S3 bucket. The S3 bucket must be in the same Region as the Amazon EMR cluster that is being evaluated.

Cloud Architect
TaskDescriptionSkills required
Deploy the AWS CloudFormation template.

On the AWS CloudFormation console, in the same Region as your S3 bucket, deploy the AWS CloudFormation template that's provided as an attachment to this pattern. In the next epic, provide the values for the parameters. For more information about deploying AWS CloudFormation templates, see the “Related resources” section.

Cloud Architect
TaskDescriptionSkills required
Name the S3 bucket.

Enter the name of the S3 bucket that you created in the first epic.

Cloud Architect
Provide the Amazon S3 key.

Provide the location of the Lambda code .zip file in your S3 bucket, without leading slashes (for example, <directory>/<file-name>.zip).

Cloud Architect
Provide an email address.

Provide an active email address to receive Amazon SNS notifications.

Cloud Architect
Define the logging level.

Define the logging level and frequency for your Lambda function. “Info” designates detailed informational messages on the application’s progress. “Error” designates error events that could still allow the application to continue running. “Warning” designates potentially harmful situations.

Cloud Architect
TaskDescriptionSkills required
Confirm the subscription.

When the template successfully deploys, it sends a subscription email message to the email address provided. You must confirm this email subscription to receive violation notifications.

Cloud Architect

AWS Lambda

Amazon EMR logging

Deploying AWS CloudFormation templates

Attachments

To access additional content that is associated with this document, unzip the following file: attachment.zip