Security in Amazon EMR - Amazon EMR

Security in Amazon EMR

Cloud security at AWS is the highest priority. As an AWS customer, you benefit from a data center and network architecture that is built to meet the requirements of the most security-sensitive organizations.

Security is a shared responsibility between AWS and you. The shared responsibility model describes this as security of the cloud and security in the cloud:

  • Security of the cloud – AWS is responsible for protecting the infrastructure that runs AWS services in the AWS Cloud. AWS also provides you with services that you can use securely. Third-party auditors regularly test and verify the effectiveness of our security as part of the AWS compliance programs. To learn about the compliance programs that apply to Amazon EMR, see AWS services in scope by compliance program.

  • Security in the cloud – Your responsibility is determined by the AWS service that you use. You are also responsible for other factors including the sensitivity of your data, your company's requirements, and applicable laws and regulations.

This documentation helps you understand how to apply the shared responsibility model when using Amazon EMR. When you develop solutions on Amazon EMR, use the following technologies to help secure cluster resources and data according to your business requirements. The topics in this chapter show you how to configure Amazon EMR and use other AWS services to meet your security and compliance objectives.

Security configurations

Security configurations in Amazon EMR are templates for different security setups. You can create a security configuration to conveniently re-use a security setup whenever you create a cluster. For more information, see Use security configurations to set up cluster security.

Data protection

You can implement data encryption to help protect data at rest in Amazon S3, data at rest in cluster instance storage, and data in transit. For more information, see Encrypt data at rest and in transit.

AWS Identity and Access Management with Amazon EMR

AWS Identity and Access Management (IAM) is an AWS service that helps an administrator securely control access to AWS resources. IAM administrators control who can be authenticated (signed in) and authorized (have permissions) to use Amazon EMR resources. IAM is an AWS service that you can use with no additional charge.

Kerberos

You can set up Kerberos to provide strong authentication through secret-key cryptography. For more information, see Use Kerberos for authentication with Amazon EMR.

Lake Formation

You can use Lake Formation permissions together with the AWS Glue Data Catalog to provide fine-grained, column-level access to databases and tables in the AWS Glue Data Catalog. Lake Formation enables federated single sign-on to EMR Notebooks or Apache Zeppelin from an enterprise identity system. For more information, see Integrate Amazon EMR with AWS Lake Formation.

Secure Socket Shell (SSH)

SSH helps provide a secure way for users to connect to the command line on cluster instances. It also provides tunneling to view web interfaces that applications host on the master node. Clients can authenticate using Kerberos or an Amazon EC2 key pair. For more information, see Use an EC2 key pair for SSH credentials and Connect to a cluster.

Amazon EC2 security groups

Security groups act as a virtual firewall for EMR cluster instances, limiting inbound and outbound network traffic. For more information, see Control network traffic with security groups.

Updates to the default Amazon Linux AMI for Amazon EMR

Important

Amazon EMR clusters that are running Amazon Linux or Amazon Linux 2 AMIs (Amazon Linux Machine Images) use default Amazon Linux behavior, and do not automatically download and install important and critical kernel updates that require a reboot. This is the same behavior as other Amazon EC2 instances running the default Amazon Linux AMI. If new Amazon Linux software updates that require a reboot (such as, kernel, NVIDIA, and CUDA updates) become available after an Amazon EMR version is released, Amazon EMR cluster instances running the default AMI do not automatically download and install those updates. To get kernel updates, you can customize your Amazon EMR AMI to use the latest Amazon Linux AMI.

Depending on the security posture of your application and the length of time that a cluster runs, you may choose to periodically reboot your cluster to apply security updates, or create a bootstrap action to customize package installation and updates. You may also choose to test and then install select security updates on running cluster instances. For more information, see Using the default Amazon Linux AMI for Amazon EMR. Note that your networking configuration must allow for HTTP and HTTPS egress to Amazon Linux repositories in Amazon S3, otherwise security updates will not succeed.