Using the default Amazon Linux AMI for Amazon EMR - Amazon EMR

Using the default Amazon Linux AMI for Amazon EMR

Each Amazon EMR release version uses a default Amazon Linux AMI for Amazon EMR unless you specify a custom AMI. The default AMI is based on the most up-to-date Amazon Linux AMI available at the time of the Amazon EMR release. The AMI is tested for compatibility with the big data applications and Amazon EMR features included with that release version.

Each Amazon EMR release version is "locked" to the Amazon Linux AMI version to maintain compatibility. This means that the same Amazon Linux AMI version is used for an Amazon EMR release version even when newer Amazon Linux AMIs become available. For this reason, we recommend that you use the latest Amazon EMR release version unless you need an earlier version for compatibility and are unable to migrate.

If you must use an earlier release version of Amazon EMR for compatibility, we recommend that you use the latest release in a series. For example, if you must use the 5.12 series, use 5.12.2 instead of 5.12.0 or 5.12.1. If a new release becomes available in a series, consider migrating your applications to the new release.

How software updates are managed

Here are default software update behaviors to be aware of:

  • Default boot behavior excludes kernel updates. When an Amazon EC2 instance in a cluster that is based on the default Amazon Linux AMI for Amazon EMR boots for the first time, it checks the enabled package repositories for Amazon Linux and Amazon EMR for software updates that apply to the AMI version. As with other Amazon EC2 instances, critical and important security updates from these repositories are installed automatically. However, if you are using an older version of Amazon Linux AMI, the latest security update may not be installed automatically. This is because the repositories referenced by EMR are fixed for each version of Amazon Linux AMI. Also note that your networking configuration must allow for HTTP and HTTPS egress to Amazon Linux repositories in Amazon S3, otherwise security updates will not succeed. For more information, see Package repository in the Amazon EC2 User Guide for Linux Instances. By default, other software packages and kernel updates that require a reboot, including NVIDIA and CUDA, are excluded from the automatic download at first boot.

    Important

    Amazon EMR clusters that are running Amazon Linux or Amazon Linux 2 AMIs (Amazon Linux Machine Images) use default Amazon Linux behavior, and do not automatically download and install important and critical kernel updates that require a reboot. This is the same behavior as other Amazon EC2 instances running the default Amazon Linux AMI. If new Amazon Linux software updates that require a reboot (such as, kernel, NVIDIA, and CUDA updates) become available after an Amazon EMR version is released, Amazon EMR cluster instances running the default AMI do not automatically download and install those updates. To get kernel updates, you can customize your Amazon EMR AMI to use the latest Amazon Linux AMI.

  • The cluster launches with or without updates. Be aware that if software updates can't be installed because package repositories are unreachable at first cluster boot, the cluster instance still completes its launch. For instance, repositories may be unreachable because S3 is temporarily unavailable, or you might have VPC or firewall rules configured to block access.

  • Don't run sudo yum update. When you connect to a cluster instance using SSH, the first few lines of screen output provide a link to the release notes for the Amazon Linux AMI that the instance uses, a notice of the most recent Amazon Linux AMI version, a notice of the number of packages available for update from the enabled repositories, and a directive to run sudo yum update.

Important

We strongly recommend that you do not run sudo yum update on cluster instances, either while connected using SSH or using a bootstrap action. This might cause incompatibilities because all packages are installed indiscriminately.

Best practices for managing software updates

  • If you use an earlier release version of Amazon EMR, consider and test a migration to the latest release before updating software packages.

  • If you migrate to a later release version or you upgrade software packages, test the implementation in a non-production environment first. The option to clone clusters using the Amazon EMR console is helpful for this.

  • Evaluate software updates for your applications and for your version of Amazon Linux AMI on an individual basis. Only test and install packages in production environments that you determine to be absolutely necessary for your security posture, application functionality, or performance.

  • Watch the Amazon Linux Security Center for updates.

  • Avoid installing packages by connecting to individual cluster instances using SSH. Instead, use a bootstrap action to install and update packages on all cluster instances as necessary. This requires that you terminate a cluster and relaunch it. For more information, see Create bootstrap actions to install additional software.